[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974924#comment-15974924 ]
Imran Rashid commented on SPARK-20391: -------------------------------------- {{memoryUsed}} and {{maxMemory}} exist in already released versions, so unfortunately I don't think we can rename those, unfortunately. There also isn't a great place to document the exact meaning of the fields -- these are the only docs we have now http://spark.apache.org/docs/latest/monitoring.html#rest-api , which doesnt' go into detail on fields returned. Do you have any suggestions on where we could document it? I don't think this is worth an api/v2 by itself. I feel like the best thing for us to do would be to leave {{memoryUsed}} and {{maxMemory}} alone, and lets reorganize / rename the others. Lets start by considering what we might want to eventually report in these metrics, so we can make sure we have unambiguous names for all of them. 1) total memory available for the executor -- onheap and offheap (eg., include the "overhead" memory on yarn) 2) heap size 3) heap used 4) total memory used by the process 5) amount of memory managed by spark 6) memory used by spark's memory manager 7) memory "designated" for caching rdds (eg. from {{spark.memory.storageFraction}} with unified memory manager)* 8) memory currently used for caching rdds 9) memory currently used for execution all of the metrics related to spark's memory management have an onheap & offheap component. all of the "memory used" metrics will vary over time, so its not really clear what you want to report. I named the metrics as "current value". But that is strange if you're looking at a completed app, and anything other than storage memory. That is somewhat orthogonal from the discussion here, though -- for now its just clearly distinguishing those metrics from the storage metrics. with the current naming: * {{maxMemory}} is (5): the amount of memory managed by spark ** {{maxOnHeapMemory}} & {{maxOffHeapMemory}} are (5) divided into onheap & offheap * {{memoryUsed}} is (8): memory currently used for cached rdds * {{onHeapMemoryUsed}} and {{offHeapMemoryUsed}} are (8) subdivided into onheap & offheap right? Given the number of different metrics already, with the list potentially growing, I think we should add a {{ExecutorMemoryMetrics}} inside {{ExecutorMetrics}}, with the following names for what we have so far: * {{totalManagedMemory}} * {{totalManagedOnHeapMemory}} * {{totalManagedOffHeapMemory}} * {{usedStorageMemory}} * {{usedOnHeapStorageMemory}} * {{usedOffHeapStorageMemory}} I'm avoiding using "max", and used "total" instead, as in the future I can see that we might want to report a "max used over time" (eg. over the entire lifetime of my application, what was the maximum execution memory?) how does that sound? > Properly rename the memory related fields in ExecutorSummary REST API > --------------------------------------------------------------------- > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: Saisai Shao > Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications/<app-id>/executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org