[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33906:
-------------------------------------

    Assignee: Baohe Zhang

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-33906
>                 URL: https://issues.apache.org/jira/browse/SPARK-33906
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 3.2.0
>            Reporter: Baohe Zhang
>            Assignee: Baohe Zhang
>            Priority: Blocker
>         Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 100000, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
>     "usedOnHeapStorageMemory" : 0,
>     "usedOffHeapStorageMemory" : 0,
>     "totalOnHeapStorageMemory" : 455501414,
>     "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
>     "JVMHeapMemory" : 135021152,
>     "JVMOffHeapMemory" : 149558576,
>     "OnHeapExecutionMemory" : 0,
>     "OffHeapExecutionMemory" : 0,
>     "OnHeapStorageMemory" : 3301,
>     "OffHeapStorageMemory" : 0,
>     "OnHeapUnifiedMemory" : 3301,
>     "OffHeapUnifiedMemory" : 0,
>     "DirectPoolMemory" : 67963178,
>     "MappedPoolMemory" : 0,
>     "ProcessTreeJVMVMemory" : 0,
>     "ProcessTreeJVMRSSMemory" : 0,
>     "ProcessTreePythonVMemory" : 0,
>     "ProcessTreePythonRSSMemory" : 0,
>     "ProcessTreeOtherVMemory" : 0,
>     "ProcessTreeOtherRSSMemory" : 0,
>     "MinorGCCount" : 15,
>     "MinorGCTime" : 101,
>     "MajorGCCount" : 0,
>     "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
>     "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout";,
>     "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr";
>   },
>   "memoryMetrics" : {
>     "usedOnHeapStorageMemory" : 0,
>     "usedOffHeapStorageMemory" : 0,
>     "totalOnHeapStorageMemory" : 455501414,
>     "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to