[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176567#comment-14176567 ] Apache Spark commented on SPARK-3957: - User 'CodingCat' has created a pull request for this issue: https://github.com/apache/spark/pull/2851 > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174747#comment-14174747 ] Nan Zhu commented on SPARK-3957: Ok, when i work on executor tab, i rwslize that, we eventually need a per-executor record of broadcast usageso will still follow the heartbeat based strategy > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174684#comment-14174684 ] Nan Zhu commented on SPARK-3957: [~andrewor14], why we didn't report broadcast variable resource usage to BlockManagerMaster in the current implementation? > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174675#comment-14174675 ] Nan Zhu commented on SPARK-3957: BlockId can directly tell if the corresponding block is a broadcast variable > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174664#comment-14174664 ] Nan Zhu commented on SPARK-3957: After looking at the problem more closely, I think we might just set the tellMaster flag to true to get this information (after put, it will report to BlockManagerMaster), instead of introducing a fat heartbeat message or open new channel the only thing we need to add is that, we need distinguish RDD and broadcast variable in BlockStatus how you guys think about it? > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174107#comment-14174107 ] Dev Lakhani commented on SPARK-3957: Hi For now I am happy for [~CodingCat] to take this on, maybe once there are some commits I can help with the UI side, but for now I'll hold back. > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174053#comment-14174053 ] Nan Zhu commented on SPARK-3957: I agree with [~andrewor14], I was also thinking about piggyback the information in the heartbeat between heartbeatReceiver and the executor ...not sure about the current Hadoop implementation, in 1.x version, TaskStatus was piggyback in the heartbeat between TaskTracker and JobTracker...to me, it's a very natural way to do this I accepted it this morning and have started some work, so, [~devlakhani], please let me finish this, thanks > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174024#comment-14174024 ] Andrew Or commented on SPARK-3957: -- Hey [~devl.development] are you planning to work on this? Or is [~CodingCat]? The latter is currently assigned but maybe you guys should work it out. > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174019#comment-14174019 ] Andrew Or commented on SPARK-3957: -- Yeah my understanding is that broadcast blocks aren't reported to the driver (and it makes sense to not report them because the driver is the one who initiated the broadcast in the first place). The source of the broadcast info we want to display is in the BlockManager of each executor, and we need to get this to the driver somehow. We could add some periodic reporting but that opens another channel between the driver and the executors. There is an ongoing effort to do something similar for task metrics https://github.com/apache/spark/pull/2087, so maybe we can piggyback this information on the heartbeats there. Also I believe this is a duplicate of an old issue SPARK-1761, though this one contains more information so let's keep this one open. I will close the other one in favor of this. > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173954#comment-14173954 ] Shivaram Venkataraman commented on SPARK-3957: -- I think it needs to be tracked in the Block Manager -- However we also need to track this on a per-executor basis and not just at the driver. Right now AFAIK, executors do not report new broadcast blocks to the master to reduce communication. However we could add broadcast blocks to some periodic report. [~andrewor] might know more. > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173926#comment-14173926 ] Dev Lakhani commented on SPARK-3957: Here is my thoughts on a possible approach. Hi All The broadcast occurs form the Spark Context to the broadcastmanager and new Broadcast method. In the first instance, the broadcasted data is stored in the Block Manager (see HttpBroadCast) of the executor. Any tracking of broadcast variables must be referenced by the BlockManagerSlaveActor and BlockManagerMasterActor. In particular UpdateBlockInfo and RemoveBroadcast should update the total memory in blocks used when blocks are added and removed. These can then be hooked up to the UI using a new Page like ExecutorsPage and defining a new methods in the relevant listener such as StorageStatusListener. These are my initial thoughts for someone new to these components, any other ideas or approaches? > Broadcast variable memory usage not reflected in UI > --- > > Key: SPARK-3957 > URL: https://issues.apache.org/jira/browse/SPARK-3957 > Project: Spark > Issue Type: Bug > Components: Block Manager, Web UI >Affects Versions: 1.0.2, 1.1.0 >Reporter: Shivaram Venkataraman >Assignee: Nan Zhu > > Memory used by broadcast variables are not reflected in the memory usage > reported in the WebUI. For example, the executors tab shows memory used in > each executor but this number doesn't include memory used by broadcast > variables. Similarly the storage tab only shows list of rdds cached and how > much memory they use. > We should add a separate column / tab for broadcast variables to make it > easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org