[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-148867896 sure, I will close this temporarily --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat closed the pull request at: https://github.com/apache/spark/pull/2851 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-148821576 Hey @CodingCat, it looks like this PR is significantly out of date. Do you mind closing it for now? You can always re-open / re-submit once we're ready to work on this feature again. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-81021347 thanks @squito I updated that again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-81020599 [Test build #28626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28626/consoleFull) for PR 2851 at commit [`1aec3a8`](https://github.com/apache/spark/commit/1aec3a8073a241d16953a0adad7a17cf3ac2bba4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-81020626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-80998056 [Test build #28626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28626/consoleFull) for PR 2851 at commit [`1aec3a8`](https://github.com/apache/spark/commit/1aec3a8073a241d16953a0adad7a17cf3ac2bba4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-80476703 thanks for the updates! just one small style comment. FWIW, after thinking about it more, I am in favor of your approach of sending the block updates, just want to get some confirmation. lgtm (with the minor style update) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-80383022 Hi, @squito , thanks for the comments, I made some revision on the patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79705758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28601/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79705666 [Test build #28601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28601/consoleFull) for PR 2851 at commit [`9b1bccf`](https://github.com/apache/spark/commit/9b1bccf89c59e95ba7132034149b0d0e657b35b7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79608399 [Test build #28601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28601/consoleFull) for PR 2851 at commit [`9b1bccf`](https://github.com/apache/spark/commit/9b1bccf89c59e95ba7132034149b0d0e657b35b7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79466606 [Test build #28585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28585/consoleFull) for PR 2851 at commit [`49ee118`](https://github.com/apache/spark/commit/49ee1188989422673203eece83d67b590b05bb27). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79466620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28585/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79377540 [Test build #28585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28585/consoleFull) for PR 2851 at commit [`49ee118`](https://github.com/apache/spark/commit/49ee1188989422673203eece83d67b590b05bb27). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79320222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28577/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79320145 [Test build #28577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28577/consoleFull) for PR 2851 at commit [`26977c9`](https://github.com/apache/spark/commit/26977c97add7d47d798985753411645e6cdcac29). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79262603 [Test build #28577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28577/consoleFull) for PR 2851 at commit [`26977c9`](https://github.com/apache/spark/commit/26977c97add7d47d798985753411645e6cdcac29). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26395613 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/BroadcastPage.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.storage + +import org.apache.spark.storage.StorageUtils +import org.apache.spark.ui.UIUtils +import org.apache.spark.util.Utils + +import scala.xml.Node + +private[ui] class BroadcastPage(parent: StorageTab) extends StorageDetailPage("broadcast", parent){ + + protected override val workerTableID: String = "broadcast-storage-by-block-table" + + protected override def objectList = listener.broadcastInfoList + + protected override def getBlockTableAndSize(objectId: Any): (Seq[Node], Int) = { +val blockLocations = StorageUtils.getBroadcastBlockLocation(objectId.asInstanceOf[Long], + storageStatusList) +val blocks = listener.storageStatusList + .flatMap(_.broadcastBlocksById(objectId.asInstanceOf[Long])) + .sortWith(_._1.name < _._1.name) + .map { case (blockId, status) => + (blockId, status, blockLocations.get(blockId).getOrElse(Seq[String]("Unknown"))) +} +(UIUtils.listingTable(blockHeader, blockRow, blocks, id = Some("rdd-storage-by-block-table")), --- End diff -- broadcast, not rdd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26395579 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/BroadcastPage.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.storage + +import org.apache.spark.storage.StorageUtils +import org.apache.spark.ui.UIUtils +import org.apache.spark.util.Utils + +import scala.xml.Node + +private[ui] class BroadcastPage(parent: StorageTab) extends StorageDetailPage("broadcast", parent){ + + protected override val workerTableID: String = "broadcast-storage-by-block-table" --- End diff -- by *worker*, not by *block* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26395589 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala --- @@ -17,134 +17,84 @@ package org.apache.spark.ui.storage -import javax.servlet.http.HttpServletRequest - import scala.xml.Node -import org.apache.spark.storage.{BlockId, BlockStatus, StorageStatus, StorageUtils} -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.storage.StorageUtils +import org.apache.spark.ui.UIUtils import org.apache.spark.util.Utils /** Page showing storage details for a given RDD */ -private[ui] class RDDPage(parent: StorageTab) extends WebUIPage("rdd") { - private val listener = parent.listener - - def render(request: HttpServletRequest): Seq[Node] = { -val parameterId = request.getParameter("id") -require(parameterId != null && parameterId.nonEmpty, "Missing id parameter") - -val rddId = parameterId.toInt -val storageStatusList = listener.storageStatusList -val rddInfo = listener.rddInfoList.find(_.id == rddId).getOrElse { - // Rather than crashing, render an "RDD Not Found" page - return UIUtils.headerSparkPage("RDD Not Found", Seq[Node](), parent) -} +private[ui] class RDDPage(parent: StorageTab) extends StorageDetailPage("rdd", parent) { -// Worker table -val workers = storageStatusList.map((rddId, _)) -val workerTable = UIUtils.listingTable(workerHeader, workerRow, workers, - id = Some("rdd-storage-by-worker-table")) + protected override val workerTableID: String = "rdd-storage-by-block-table" --- End diff -- by *worker*, not by *block* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-79050227 Hi @CodingCat , I took another look through everything. My comments are all very minor. But I think I'd still like to get some more thoughts from others on how the broadcast block info is passed along. As I mentioned earlier, it is also possible to get it into `TaskEnd` events, like the RDD blocks. I'm torn, so I'd like to hear what others think. Sorry it is taking so long to get feedback ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26394719 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala --- @@ -79,4 +87,19 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar override def onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD) = synchronized { _rddInfoMap.remove(unpersistRDD.rddId) } + + override def onBlockUpdate(blockUpdateEvent: SparkListenerBlockUpdate) = synchronized { +// only update broadcast for now, need to be modified if want to track other blocks --- End diff -- this comment is slightly misleading -- as in other places, I would instead point out that RDD blocks are handled in `onStageCompleted`, but for now we are ignoring all other blocks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393684 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -166,28 +195,69 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { * Note that this is much faster than `this.rddBlocksById(rddId).size`, which is * O(blocks in this RDD) time. */ - def numRddBlocksById(rddId: Int): Int = _rddBlocks.get(rddId).map(_.size).getOrElse(0) + def numRddBlocksById(rddId: Long): Int = _rddBlocks.get(rddId).map(_.size).getOrElse(0) + + + /** + * Return the number of Broadcast blocks stored in this block manager in O(RDDs) time. + * Note that this is much faster than `this.rddBlocks.size`, which is O(RDD blocks) time. --- End diff -- RDD should be Broadcast --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26394598 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala --- @@ -79,4 +87,19 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar override def onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD) = synchronized { _rddInfoMap.remove(unpersistRDD.rddId) } + + override def onBlockUpdate(blockUpdateEvent: SparkListenerBlockUpdate) = synchronized { +// only update broadcast for now, need to be modified if want to track other blocks +val broadcastIdOpt = blockUpdateEvent.blockId.asBroadcastId +if (broadcastIdOpt.isDefined) { --- End diff -- I don't think it particularly matters here, but fyi a more "scala" way of doing this is: ``` blockUpdateEvent.blockId.asBroadcastId.foreach{broadcastBlock => val broadcastId = broadcastBlock.broadcastId ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393960 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -224,8 +299,14 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { } else { _rddStorageInfo(rddId) = (newMem, newDisk, newTachyon, level) } + case BroadcastBlockId(broadcastId, _) => +if (newMem + newDisk + newTachyon == 0) { + _broadcastStorageInfo.remove(broadcastId) +} else { + _broadcastStorageInfo(broadcastId) = (newMem, newDisk, newTachyon) +} --- End diff -- I think you could use the same `case rddOrBroadcastBlockId @ (_: BroadcastBlockId | _: RDDBlockId)` pattern here to combine both matches in this method ... its subjective whether its worth it or not, but I think it would be cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26394350 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala --- @@ -39,11 +40,18 @@ private[ui] class StorageTab(parent: SparkUI) extends SparkUITab(parent, "storag @DeveloperApi class StorageListener(storageStatusListener: StorageStatusListener) extends SparkListener { private[ui] val _rddInfoMap = mutable.Map[Int, RDDInfo]() // exposed for testing + private[ui] val _broadcastInfoMap = mutable.Map[Long, BroadcastInfo]() def storageStatusList = storageStatusListener.storageStatusList /** Filter RDD info to include only those with cached partitions */ - def rddInfoList = _rddInfoMap.values.filter(_.numCachedPartitions > 0).toSeq + def rddInfoList = { +_rddInfoMap.values.filter(_.numCachedPartitions > 0).toSeq + } + + + /** Filter broadcast info to include only those with cached partitions */ --- End diff -- delete this comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393712 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -166,28 +195,69 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { * Note that this is much faster than `this.rddBlocksById(rddId).size`, which is * O(blocks in this RDD) time. */ - def numRddBlocksById(rddId: Int): Int = _rddBlocks.get(rddId).map(_.size).getOrElse(0) + def numRddBlocksById(rddId: Long): Int = _rddBlocks.get(rddId).map(_.size).getOrElse(0) + + + /** + * Return the number of Broadcast blocks stored in this block manager in O(RDDs) time. + * Note that this is much faster than `this.rddBlocks.size`, which is O(RDD blocks) time. + */ + def numBroadcastBlocks: Int = _broadcastBlocks.values.map(_.size).sum + + /** + * Return the number of blocks that belong to the given Broadcast variable in O(1) time. + * Note that this is much faster than `this.rddBlocksById(rddId).size`, which is + * O(blocks in this RDD) time. --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393526 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -101,25 +122,30 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { private[spark] def updateBlock(blockId: BlockId, blockStatus: BlockStatus): Unit = { addBlock(blockId, blockStatus) } + + private def getIdAndBlockMap(blockId: BlockId) = blockId match { +case RDDBlockId(rddId, _) => (rddId.toLong, _rddBlocks) +case BroadcastBlockId(broadcastId, _) => (broadcastId, _broadcastBlocks) + } /** Remove the given block from this storage status. */ private[spark] def removeBlock(blockId: BlockId): Option[BlockStatus] = { updateStorageInfo(blockId, BlockStatus.empty) blockId match { - case RDDBlockId(rddId, _) => -// Actually remove the block, if it exists -if (_rddBlocks.contains(rddId)) { - val removed = _rddBlocks(rddId).remove(blockId) - // If the given RDD has no more blocks left, remove the RDD - if (_rddBlocks(rddId).isEmpty) { -_rddBlocks.remove(rddId) + case rddOrBroadcastBlockId @ (_: BroadcastBlockId | _: RDDBlockId) => +val (id, blockMap) = getIdAndBlockMap(rddOrBroadcastBlockId) +var removed: Option[BlockStatus] = None +if (blockMap.contains(id)) { + removed = blockMap(id).remove(blockId) --- End diff -- you don't need the `var` here, you can keep the `val` like before: ``` if (blockMap.contains(id)) { val removed = blockMap(id).remove(blockId) ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393202 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala --- @@ -88,4 +88,19 @@ class StorageStatusListener extends SparkListener { } } + override def onBlockUpdate(blockUpdateEvent: SparkListenerBlockUpdate) = synchronized { +val executorId = blockUpdateEvent.blockManagerId.executorId +updateStorageStatus(executorId, Seq((blockUpdateEvent.blockId, --- End diff -- If I'm following things correctly, this will also get called for RDD blocks, which are already getting updated by `TaskEnd`. I suppose its not a problem for them to get updated again with the same info, but you could probably skip the update for non-broadcast blocks, and include a comment as in `EventLoggingListener`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26393024 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -181,6 +181,12 @@ private[spark] class EventLoggingListener( logEvent(event, flushLogger = true) override def onExecutorRemoved(event: SparkListenerExecutorRemoved) = logEvent(event, flushLogger = true) + override def onBlockUpdate(event: SparkListenerBlockUpdate) = { +// we only log Broadcast block update for now --- End diff -- er, I guess its also happening in `TaskEnd` events --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26392543 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -357,7 +357,13 @@ private[spark] class BlockManager( info: BlockInfo, status: BlockStatus, droppedMemorySize: Long = 0L): Unit = { -val needReregister = !tryToReportBlockStatus(blockId, info, status, droppedMemorySize) +def ifAsync: Boolean = { + blockId.isBroadcast +} +// asynchronously report broadcast variables, in future we only need to change the +// implementation of ifAsync --- End diff -- hmmm...I mean, for reporting broadcast, we do it asynchronously, but in future, if we want to report other blocks, we can make ifAsync logic more complicate, e.g. ``` if (blockId.isBroadcast) true else false ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26392169 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -357,7 +357,13 @@ private[spark] class BlockManager( info: BlockInfo, status: BlockStatus, droppedMemorySize: Long = 0L): Unit = { -val needReregister = !tryToReportBlockStatus(blockId, info, status, droppedMemorySize) +def ifAsync: Boolean = { + blockId.isBroadcast +} +// asynchronously report broadcast variables, in future we only need to change the +// implementation of ifAsync --- End diff -- I find this comment a little bit confusing -- it sounds like you are saying there is future work still to do. Do you mean something like "Broadcast variables are currently reported asynchronously, but if we later decide they should be reported synchronously, we just need to update ifAsync`" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26391837 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -181,6 +181,12 @@ private[spark] class EventLoggingListener( logEvent(event, flushLogger = true) override def onExecutorRemoved(event: SparkListenerExecutorRemoved) = logEvent(event, flushLogger = true) + override def onBlockUpdate(event: SparkListenerBlockUpdate) = { +// we only log Broadcast block update for now --- End diff -- can you add to this comment that we don't need to log RDD blocks, b/c they are already logged as part of `StageCompleted` events --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r26391448 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -67,10 +67,8 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, scheduler: TaskSchedule override def receiveWithLogging = { case Heartbeat(executorId, taskMetrics, blockManagerId) => - val unknownExecutor = !scheduler.executorHeartbeatReceived( -executorId, taskMetrics, blockManagerId) - val response = HeartbeatResponse(reregisterBlockManager = unknownExecutor) - executorLastSeen(executorId) = System.currentTimeMillis() + val response = HeartbeatResponse( +!scheduler.executorHeartbeatReceived(executorId, taskMetrics, blockManagerId)) --- End diff -- did you mean to remove the update of `executorLastSeen`? seems like maybe a mistake in the merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78615124 Hi, @squito , would you mind taking further review when you have time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78607932 [Test build #28534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28534/consoleFull) for PR 2851 at commit [`223cdfb`](https://github.com/apache/spark/commit/223cdfb6c02412f6bafbbbe32e160f08a96e984f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78607951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28534/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78582270 [Test build #28534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28534/consoleFull) for PR 2851 at commit [`223cdfb`](https://github.com/apache/spark/commit/223cdfb6c02412f6bafbbbe32e160f08a96e984f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78577401 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28529/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78577370 [Test build #28529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28529/consoleFull) for PR 2851 at commit [`2ea8ee3`](https://github.com/apache/spark/commit/2ea8ee341a40ae543284875b3e6ba333fc07ca9d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-78560997 [Test build #28529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28529/consoleFull) for PR 2851 at commit [`2ea8ee3`](https://github.com/apache/spark/commit/2ea8ee341a40ae543284875b3e6ba333fc07ca9d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74798631 Hi, @squito , thanks for the further comments I have addressed them and also replied under some of them The snapshots was uploaded, I will not be here tomorrow but will be back after that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74746516 [Test build #27638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27638/consoleFull) for PR 2851 at commit [`564f7d2`](https://github.com/apache/spark/commit/564f7d26de6337bfd1fb6453c6344d73c9dd0ed0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74746530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27638/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74731158 [Test build #27638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27638/consoleFull) for PR 2851 at commit [`564f7d2`](https://github.com/apache/spark/commit/564f7d26de6337bfd1fb6453c6344d73c9dd0ed0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74728276 ![image](https://cloud.githubusercontent.com/assets/678008/6235037/622218b4-b6ad-11e4-8cd6-bd291fd4e58a.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24840983 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo( logInfo("Removed %s on %s on tachyon (size: %s)".format( blockId, blockManagerId.hostPort, Utils.bytesToString(blockStatus.tachyonSize))) } + return BlockStatus(storageLevel, 0, 0, 0) } +null --- End diff -- actually the code should not enter into this line...unless something wrong happened in updating https://github.com/CodingCat/spark/blob/SPARK-3957/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala#L457 maybe I can change https://github.com/CodingCat/spark/blob/SPARK-3957/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala#L373 to an Option, or add a if guard in https://github.com/CodingCat/spark/blob/SPARK-3957/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala#L373 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24840220 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -21,13 +21,14 @@ import org.apache.spark.annotation.DeveloperApi import org.apache.spark.rdd.RDD import org.apache.spark.util.Utils +trait InMemoryObjectInfo --- End diff -- Actually I used it in https://github.com/apache/spark/pull/2851/files#diff-dc63d4ab5b2ed5a2135c054be78820eeR47 I need a common base class to save RDD and Broadcast Info ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74713997 can you also post a screenshot of the detailed page for a broadcast var? Ideally involving a broadcast var that gets turned into multiple blocks by `TorrentBroadcast`, I think just needs to be over 4MB by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24835658 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/BroadcastPage.scala --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.storage + +import org.apache.spark.storage.StorageUtils +import org.apache.spark.ui.UIUtils +import org.apache.spark.util.Utils + +import scala.xml.Node + +private[ui] class BroadcastPage(parent: StorageTab) extends InMemoryObjectPage("broadcast", parent){ + + protected override val workerTableID: String = "broadcast-storage-by-block-table" + + protected override def objectList = listener.broadcastInfoList + + protected override def getBlockTableAndSize(objectId: Any): (Seq[Node], Int) = { +val blockLocations = StorageUtils.getBroadcastBlockLocation(objectId.asInstanceOf[Long], + storageStatusList) +val blocks = listener.storageStatusList + .flatMap(_.broadcastBlocksById(objectId.asInstanceOf[Long])) + .sortWith(_._1.name < _._1.name) + .map { case (blockId, status) => + (blockId, status, blockLocations.get(blockId).getOrElse(Seq[String]("Unknown"))) +} +(UIUtils.listingTable(blockHeader, blockRow, blocks, id = Some("rdd-storage-by-block-table")), + blocks.size) + } + + protected override def generateContent(objectId: Long): (String, Seq[Node]) = { +val objectInfo = objectList.find(_.id == objectId).getOrElse { + // Rather than crashing, render an "Not Found" page + return (objectId.toString, nonFoundErrorInfo) +} +val (workerTable, workerCount) = getWorkerTableAndSize(objectId) + +val (blockTable, blockCount) = getBlockTableAndSize(objectId) + +val content = + + + + + Total Partitions: + {objectInfo.numPartitions} --- End diff -- `numPartitions` doesn't mean anything for a Broadcast var. I think its always `0` in your code, unless I've missed something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24833690 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/InMemoryObjectPage.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.storage + +import javax.servlet.http.HttpServletRequest + +import org.apache.spark.storage._ +import org.apache.spark.ui.{UIUtils, WebUIPage} +import org.apache.spark.util.Utils + +import scala.xml.Node + +private[ui] abstract class InMemoryObjectPage(pageName: String, parent: StorageTab) --- End diff -- RDD aren't necessarily in memory ... maybe a better name would be `StorageDetailPage`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24833573 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -49,9 +50,40 @@ class RDDInfo( } } + private[spark] object RDDInfo { def fromRdd(rdd: RDD[_]): RDDInfo = { val rddName = Option(rdd.name).getOrElse(rdd.id.toString) new RDDInfo(rdd.id, rddName, rdd.partitions.size, rdd.getStorageLevel) } } + +@DeveloperApi +class BroadcastInfo( +val id: Long, +val name: String, +val numPartitions: Int) extends Ordered[BroadcastInfo] with InMemoryObjectInfo { + + var memSize = 0L + var diskSize = 0L + var tachyonSize = 0L + + override def toString = { +import Utils.bytesToString +("%s\" (%d) ; " + + "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format( +name, id, bytesToString(memSize), bytesToString(tachyonSize), bytesToString(diskSize)) + } + + override def compare(that: BroadcastInfo): Int = { +if (this.id > that.id) { + 1 +} else { + if (this.id == that.id) { +return 0 + } + -1 --- End diff -- If you really want a suggestion here :-): com.google.common.primitives.Longs.compare(this.id, that.id) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24833566 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -271,4 +368,19 @@ private[spark] object StorageUtils { blockLocations } + + /** + * Return a mapping from block ID to its locations for each block that belongs to the given RDD. --- End diff -- broadcast block, not RDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24833405 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -49,9 +50,40 @@ class RDDInfo( } } + private[spark] object RDDInfo { def fromRdd(rdd: RDD[_]): RDDInfo = { val rddName = Option(rdd.name).getOrElse(rdd.id.toString) new RDDInfo(rdd.id, rddName, rdd.partitions.size, rdd.getStorageLevel) } } + +@DeveloperApi +class BroadcastInfo( +val id: Long, +val name: String, +val numPartitions: Int) extends Ordered[BroadcastInfo] with InMemoryObjectInfo { + + var memSize = 0L + var diskSize = 0L + var tachyonSize = 0L + + override def toString = { +import Utils.bytesToString +("%s\" (%d) ; " + + "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format( +name, id, bytesToString(memSize), bytesToString(tachyonSize), bytesToString(diskSize)) + } + + override def compare(that: BroadcastInfo): Int = { +if (this.id > that.id) { + 1 +} else { + if (this.id == that.id) { +return 0 + } + -1 --- End diff -- super minor: ``` if (this.id > that.id) { 1 } else if (this.is == that.id) { 0 } else { -1 } ``` (sorry I was wrong w/ the earlier suggestion) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24831712 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -21,13 +21,14 @@ import org.apache.spark.annotation.DeveloperApi import org.apache.spark.rdd.RDD import org.apache.spark.util.Utils +trait InMemoryObjectInfo --- End diff -- you don't really need this trait at all --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24831483 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo( logInfo("Removed %s on %s on tachyon (size: %s)".format( blockId, blockManagerId.hostPort, Utils.bytesToString(blockStatus.tachyonSize))) } + return BlockStatus(storageLevel, 0, 0, 0) } +null --- End diff -- actually, can you explain the `null` case? how does that happen, and won't that result in an NPE when it gets to your code in `StorageStatusListener` and in `JsonProtocol`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24831053 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo( logInfo("Removed %s on %s on tachyon (size: %s)".format( blockId, blockManagerId.hostPort, Utils.bytesToString(blockStatus.tachyonSize))) } + return BlockStatus(storageLevel, 0, 0, 0) } +null --- End diff -- you don't need `return` here -- the last value of each block is its return value. so this could be: ``` if (storageLevel.isValid) { ... _blocks.get(blockId) } else if (_blocks.containsKey(blockId)) { ... BlockStatus(storageLevel, 0, 0, 0) } else { null } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24830276 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala --- @@ -24,7 +24,13 @@ import org.apache.spark.util.ListenerBus */ private[spark] trait SparkListenerBus extends ListenerBus[SparkListener, SparkListenerEvent] { + private[spark] var filter: DefaultSparkListenerEventFilter = new DefaultSparkListenerEventFilter + override def onPostEvent(listener: SparkListener, event: SparkListenerEvent): Unit = { +if (!filter.validate(event)) { + return --- End diff -- also, I think there is still one missing piece to get the json into the event logging for the history server. You need to implement to put in the implementation for `onBlockUpdate` in `EventLoggingListener`. I am suggesting that you just put this filter into that implementation, and get rid of the `Filter` abstraction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24829369 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala --- @@ -24,7 +24,13 @@ import org.apache.spark.util.ListenerBus */ private[spark] trait SparkListenerBus extends ListenerBus[SparkListener, SparkListenerEvent] { + private[spark] var filter: DefaultSparkListenerEventFilter = new DefaultSparkListenerEventFilter + override def onPostEvent(listener: SparkListener, event: SparkListenerEvent): Unit = { +if (!filter.validate(event)) { + return --- End diff -- I think the `DefaultSparkListenerEventFilter` is probably adding an abstraction without any really good reason. If we do stick w/ the new `SparkListenerBlockUpdate` events, I think its better if you just put the filter into the right method on the event logging listener and remove `DefaultSparkListenerEventFilter`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74698831 Hi @CodingCat thanks for making all the updates. Sorry I hadn't realized the subtlety w/ `Int` vs `Long` on the `RDDBlockId` and `BroadcastBlockId`. Still, I think its still a good change for simplifying the code, glad you figured a way to make it work. And on issue (2), the memory usage of a worker, I think what you have is correct, its supposed to be memory usage of the particular object on the worker. This is on a UI page in the context of a particular object -- there is a separate page to summarize the memory usage of the executor overall. (though I agree the UI is a little confusing ...) I'll make a few more minor comments on the code, but mostly I just want to get another opinion on the right events to pass the block added events around, as I have mentioned above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74602430 [Test build #27595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27595/consoleFull) for PR 2851 at commit [`0d854b8`](https://github.com/apache/spark/commit/0d854b835db225baa3b2ee2271d03afa87d9a3e8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockUpdate(blockManagerId: BlockManagerId, blockId: BlockId,` * `class DefaultSparkListenerEventFilter ` * `class BroadcastInfo(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74602432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27595/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74599888 oh...the failed one is the previous submission. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74599587 weird...I can pass the test cases in my laptop without any problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74595231 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27588/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74595226 **[Test build #27588 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27588/consoleFull)** for PR 2851 at commit [`04726c4`](https://github.com/apache/spark/commit/04726c410909144581a308651837c2039b841276) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74593574 [Test build #27595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27595/consoleFull) for PR 2851 at commit [`0d854b8`](https://github.com/apache/spark/commit/0d854b835db225baa3b2ee2271d03afa87d9a3e8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74587814 **[Test build #27571 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27571/consoleFull)** for PR 2851 at commit [`34aed25`](https://github.com/apache/spark/commit/34aed25737ef896356505e4ca2f50dd2d788b98c) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74587826 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27571/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74583601 Potential problems in the current patch 1. to de-duplicate the code in StorageUtils, I save RDD ID with **Int** type in a **Long** Field (https://github.com/apache/spark/pull/2851/files#diff-21027f5c826cd378daaae5f7c3eea2b5R41), otherwise it's hard to get a BlockMap according to the blockId (https://github.com/apache/spark/pull/2851/files#diff-21027f5c826cd378daaae5f7c3eea2b5R126) (you will see a lot of type imcompatibility from the compiler) 2. InMemoryObjectPage.scala (https://github.com/apache/spark/pull/2851/files#diff-dc63d4ab5b2ed5a2135c054be78820eeR97), I'm not sure if the semantics of (memory usage of a worker) is correct...it should be the memory used by the current InMemoryObject or the total memory usage? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74582634 Hi, @squito, thank you very much for your patient review Here is the summary on what I have done on addressing your comments 1. regarding the duplication in RDDPage and BroadcastPage Now, both of them inherit from InMemoryObjectPage, which defines the common parts of these two classes, e.g. Worker Header, etc. 2. Serialization and Deserialization of SparkListenerBlockUpdate event I implemented the serialization and deserialization methods (https://github.com/apache/spark/pull/2851/files#diff-4f6ba18259eb4c31ac930e18f1ba6f88R99) and (https://github.com/apache/spark/pull/2851/files#diff-4f6ba18259eb4c31ac930e18f1ba6f88R480) To prevent the large volume of events being logged, there is an event filter existing in SparkListenerBus (https://github.com/apache/spark/pull/2851/files#diff-01185688f339238fc6689baa6106df63R27). Now this filter ensure that only the broadcast block update event is logged (https://github.com/apache/spark/pull/2851/files#diff-fbe8f967070627c8dc155237e77c7314R130) (actually this was implemented long time ago) 3. get rid of the mutable variable in BlockManagerMasterActor Done (https://github.com/apache/spark/pull/2851/files#diff-6da9360efd5e37dcd0edcc651db1e9cbR508) 4. shorten the compare() in BroadcastInfo (https://github.com/apache/spark/pull/2851/files#diff-6da9360efd5e37dcd0edcc651db1e9cbR508) Unfortunately, I probably cannot do that, since compare() returns Int, but broadcastId is Long 5. tighten the methods in StorageUtils Done (https://github.com/apache/spark/pull/2851/files#diff-6da9360efd5e37dcd0edcc651db1e9cbR508) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74581075 [Test build #27588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27588/consoleFull) for PR 2851 at commit [`04726c4`](https://github.com/apache/spark/commit/04726c410909144581a308651837c2039b841276). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74575563 [Test build #27570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27570/consoleFull) for PR 2851 at commit [`b05196a`](https://github.com/apache/spark/commit/b05196af57e2c9ad8504b7ab12f1d3b2ea454067). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockUpdate(blockManagerId: BlockManagerId, blockId: BlockId,` * `class DefaultSparkListenerEventFilter ` * `class BroadcastInfo(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74575571 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27570/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74573745 [Test build #27571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27571/consoleFull) for PR 2851 at commit [`34aed25`](https://github.com/apache/spark/commit/34aed25737ef896356505e4ca2f50dd2d788b98c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74568860 [Test build #27570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27570/consoleFull) for PR 2851 at commit [`b05196a`](https://github.com/apache/spark/commit/b05196af57e2c9ad8504b7ab12f1d3b2ea454067). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74568272 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27569/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74568258 [Test build #27569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27569/consoleFull) for PR 2851 at commit [`57d3f46`](https://github.com/apache/spark/commit/57d3f4609d069c8891cda3f0658d1eb1bb93e9f1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74568269 [Test build #27569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27569/consoleFull) for PR 2851 at commit [`57d3f46`](https://github.com/apache/spark/commit/57d3f4609d069c8891cda3f0658d1eb1bb93e9f1). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockUpdate(blockManagerId: BlockManagerId, blockId: BlockId,` * `class DefaultSparkListenerEventFilter ` * `class BroadcastInfo(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74510287 well I was leaning towards using a `ThreadLocal` to get the broadcast blocks into the task end event ... but I forgot that broadcast blocks are also created by the driver (eg. [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L98) or [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala#L52)). It doesn't make any sense to stick those into a task end event. I suppose they could go into a task **start** event, but that is still a little weird, maybe we really do need a new event. @rxin @andrewor14 thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74447987 working on this now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24696753 --- Diff: core/src/main/scala/org/apache/spark/storage/BroadcastInfo.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.broadcast.{TorrentBroadcast, Broadcast} +import org.apache.spark.util.Utils +import org.apache.spark.util.Utils._ + +@DeveloperApi +class BroadcastInfo( +val id: Long, +val name: String, +val numPartitions: Int) extends Ordered[BroadcastInfo] { + + var memSize = 0L + var diskSize = 0L + var tachyonSize = 0L + + override def toString = { +import Utils.bytesToString +("%s\" (%d) ; " + + "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format( +name, id, bytesToString(memSize), bytesToString(tachyonSize), bytesToString(diskSize)) + } + + override def compare(that: BroadcastInfo) = { +if (this.id > that.id) { + 1 +} else { + if (this.id == that.id) { +0 + } + -1 --- End diff -- what type does `compare` return? It's generally an `int`, and `id` is a long. (BTW, *hint*hint*, make the method's return type explicit.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24696584 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -91,6 +91,7 @@ private[spark] object JsonProtocol { executorRemovedToJson(executorRemoved) // These aren't used, but keeps compiler happy case SparkListenerExecutorMetricsUpdate(_, _) => JNothing + case SparkListenerBlockUpdate(_, _, _) => JNothing --- End diff -- how often is this event posted, is it once per cached partition? If so there might be too many such events to log them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24683430 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -118,8 +140,20 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { } else { None } + case BroadcastBlockId(broadcastId, _) => +// Actually remove the block, if it exists +if (_broadcastBlocks.contains(broadcastId)) { + val removed = _broadcastBlocks(broadcastId).remove(blockId) + // If the given RDD has no more blocks left, remove the RDD --- End diff -- Please, don't do that. It is very hard to understand these kinds of extractors, especially when RddOrBlockId itself doesn't have a real class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74298037 hmm, ok now I see that it is hard to get the BroadcastBlockIds into the task end metrics, since when broadcast blocks get created, its a side effect of task-deserialization, so the broadcast var doesn't really know which task its being created for. The only other option I can think of is to create a `ThreadLocal` with the current task id, before deserializing the task [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L177). Then when the broadcast var is actually put in the block manager, ([eg. for `TorrentBroadcast`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L181)) and finally when the task is finished you'd [update the metrics](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L214). Honestly I am somewhat torn between (a) leaving it as you have it, (b) what I've outlined in this comment, and (c) having RDD block info also tracked by your new mechansim (option 2 above) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74293360 Hi @CodingCat , thanks for working on this, sorry the review has been dragging out. This will be a nice addition, but I have some concerns similar to the other reviewers -- but lemme explain in a little more detail. I don't necessarily see anything wrong with your approach for getting the Broadcast block info by itself, but it seems like it ought to share a similar code path to tracking the RDDBlocks. RDDBlocks get updated in the `SparkListenerTaskEnd` event, through `event.taskMetrics.updatedBlocks`. this makes its way to the current `StorageTab` in [`StorageListener`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L59), which then [filters down to the RDDBlocks](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L50) via `BlockId.asRDDId`. So, this makes me wonder about two possible alternatives for unifying the code paths: 1) Can you get the Broadcast block ids into the same `SparkListenerTaskEnd` event, which would eliminate the need to create a new event? (In fact, maybe those block ids are already there? I am still looking into the code for how those events get created ...) 2) If we can't get Broadcast block ids into the same `SparkListenerTaskEnd`, does it make sense to change the RDD block tracking logic to do the same thing? Yes this will be a bigger change, but it seems strange to have these two blocks tracked by such different mechanisms so it will lead to simpler code overall. (But really that pushes me to option #1) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24673735 --- Diff: core/src/main/scala/org/apache/spark/storage/BroadcastInfo.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.broadcast.{TorrentBroadcast, Broadcast} +import org.apache.spark.util.Utils +import org.apache.spark.util.Utils._ + +@DeveloperApi +class BroadcastInfo( +val id: Long, +val name: String, +val numPartitions: Int) extends Ordered[BroadcastInfo] { + + var memSize = 0L + var diskSize = 0L + var tachyonSize = 0L + + override def toString = { +import Utils.bytesToString +("%s\" (%d) ; " + + "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format( +name, id, bytesToString(memSize), bytesToString(tachyonSize), bytesToString(diskSize)) + } + + override def compare(that: BroadcastInfo) = { +if (this.id > that.id) { + 1 +} else { + if (this.id == that.id) { +0 + } + -1 --- End diff -- this can be just `this.id - that.id` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24673367 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -466,10 +467,11 @@ private[spark] class BlockManagerInfo( storageLevel: StorageLevel, memSize: Long, diskSize: Long, - tachyonSize: Long) { + tachyonSize: Long): BlockStatus = { -updateLastSeenMs() +var blockStatus: BlockStatus = null --- End diff -- we can get rid of this var -- just have each branch of the if/else finish w/ the block status you want to return. ``` if (_blocks.containsKey(blockId)) { ... if (storageLevel.isValid) { ... _blocks.get(blockId) } else if (_blocks.containsKey(blockId)) {//I realize you aren't modifying this part, but isn't this condition redundant? val blockStatus: BlockStatus = _blocks.get(blockId) ... BlockStatus(storageLevel, 0, 0, 0) } } else { null } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24672255 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -91,6 +91,7 @@ private[spark] object JsonProtocol { executorRemovedToJson(executorRemoved) // These aren't used, but keeps compiler happy case SparkListenerExecutorMetricsUpdate(_, _) => JNothing + case SparkListenerBlockUpdate(_, _, _) => JNothing --- End diff -- This event should be converted to json so it can be seen in the UI for completed apps. You need to implement serialization here, and also deserialization in `sparkEventFromJson`, as that is used by `ReplayListenerBus` to show the UI for completed apps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24671936 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/BroadcastPage.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.storage + +import javax.servlet.http.HttpServletRequest + +import org.apache.spark.storage.{BlockStatus, BlockId, StorageStatus, StorageUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} +import org.apache.spark.util.Utils + +import scala.xml.Node + +private[ui] class BroadcastPage(parent: StorageTab) extends WebUIPage("broadcast"){ --- End diff -- this is just a minor variation of `RDDPage`, right? better to make a common superclass with most of the code, then `RDDPage` and `BroadcastPage` would just need a couple of tweaks (eg., labels in the UI, which block map to look at etc.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24668891 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -118,8 +140,20 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { } else { None } + case BroadcastBlockId(broadcastId, _) => +// Actually remove the block, if it exists +if (_broadcastBlocks.contains(broadcastId)) { + val removed = _broadcastBlocks(broadcastId).remove(blockId) + // If the given RDD has no more blocks left, remove the RDD --- End diff -- you could tighten it up even more if you moved that helper method to an extractor: ``` object RddOrBlockId { def unapply(blockId: BlockId): Option[(Int, Map[BlockId,BlockStatus])] = { case RDDBlockId(rddId, _) => Some((rddId, _rddBlocks)) case BroadcastBlockId(broadcastId, _) => Some((broadcastId, _broadcastBlocks)) case _ => None } } ``` then the actual matching in each of these blocks would simplify to: ``` case RddOrBlockId(id, blockMap) => val removed = blockMap(id).remove(blockId) ... ``` (the point is, the helper function to get the id & blockMap goes right into the pattern match.) But maybe this is a bit of scala that is too esoteric for our code base for saving one line in each of these cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74255969 Hi, @squito , thanks for the review, I will handle it in the weekend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24642955 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -118,8 +140,20 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { } else { None } + case BroadcastBlockId(broadcastId, _) => +// Actually remove the block, if it exists +if (_broadcastBlocks.contains(broadcastId)) { + val removed = _broadcastBlocks(broadcastId).remove(blockId) + // If the given RDD has no more blocks left, remove the RDD --- End diff -- actually now I am thinking there is a lot of copy-pasting that could be cleaned up. you could mostly merge this w/ the block above if you did something like: ``` case rddOrBlockId @ (_: BroadcastBlockId | _: RDDBlockId) => val (id, blockMap) = getIdAndBlockMap(rddOrBlockId) val removed = blockMap(id).remove(blockId) ... ``` where the helper function `getIdAndBlockMap` is something like: ``` def getIdAndBlockMap(blockId: BlockId): (Int, Map[BlockId, BlockStatus]) = blockId match { case RDDBlockId(rddId, _) => (rddId, _rddBlocks) case BroadcastBlockId(broadcastId, _) => (broadcastId, _broadcastBlocks) } ``` and then you could do a similar thing in a few other places. You could also take this a step further, and even merge `_rddBlocks` and `_broadcastBlocks` into a `EnumMap[BlockType, Map[BlockId, BlockStatus]]` if you made a new `public enum BlockType{ RDD,Broadcast}`, but that might not really help much since at the end of the day you do want separate getter methods for the RDD and Broadcast stuff --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24642582 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -118,8 +140,20 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { } else { None } + case BroadcastBlockId(broadcastId, _) => +// Actually remove the block, if it exists +if (_broadcastBlocks.contains(broadcastId)) { + val removed = _broadcastBlocks(broadcastId).remove(blockId) + // If the given RDD has no more blocks left, remove the RDD --- End diff -- again, broadcast variable instead of RDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24642522 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -81,19 +84,38 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { */ def rddBlocks: Map[BlockId, BlockStatus] = _rddBlocks.flatMap { case (_, blocks) => blocks } + + /** + * Return the broadcast blocks stored in this block manager. + * + * Note that this is somewhat expensive, as it involves cloning the underlying maps and then + * concatenating them together. Much faster alternatives exist for common operations such as + * getting the memory, disk, and off-heap memory sizes occupied by this RDD. --- End diff -- should be "broadcast variable" instead of "RDD" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74190415 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27405/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74190412 [Test build #27405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27405/consoleFull) for PR 2851 at commit [`08ad98d`](https://github.com/apache/spark/commit/08ad98d5a544225c733411fb895de5fa32431c24). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockUpdate(blockManagerId: BlockManagerId, blockId: BlockId,` * `class DefaultSparkListenerEventFilter ` * `class BroadcastInfo(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74189213 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27404/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74189206 [Test build #27404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27404/consoleFull) for PR 2851 at commit [`5c60a34`](https://github.com/apache/spark/commit/5c60a345e9b6e34f51403df1bf721d00fac7bffa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockUpdate(blockManagerId: BlockManagerId, blockId: BlockId,` * `class DefaultSparkListenerEventFilter ` * `class BroadcastInfo(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org