[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 closed the pull request at: https://github.com/apache/spark/pull/16714 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/16714 Ok, not including the updated blocks in task metrics reduced the size of our event logs. But I am closing this PR as the current implementation doesn't seem to be in the right way. Thanks for the inputs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/16714 @vanzin @ajbozarth if you guys think having an option to skip logging internal accumulators (in my case I don't use the SQL UI) and completely getting rid of updated block statues are not needed, I can close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r113776549 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -343,10 +376,14 @@ private[spark] object JsonProtocol { ("Bytes Written" -> taskMetrics.outputMetrics.bytesWritten) ~ ("Records Written" -> taskMetrics.outputMetrics.recordsWritten) val updatedBlocks = - JArray(taskMetrics.updatedBlockStatuses.toList.map { case (id, status) => -("Block ID" -> id.toString) ~ - ("Status" -> blockStatusToJson(status)) - }) + if (omitUpdatedBlockStatuses) { --- End diff -- @vanzin @ajbozarth #17412 gets rid of updated block statuses from the accumulable but not from task metrics. If you think it's ok to not to have an option to get rid of updated block statuses, then I can just get rid of updated block statuses here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/16714 I would still like not to have internal accumulators in the event logs, as well as updated block statuses. @vanzin would you be ok with eliminating all internal accumulators and have an option to skip logging updated block statues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r103569094 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -62,18 +62,21 @@ private[spark] object JsonProtocol { * JSON serialization methods for SparkListenerEvents | * -- */ - def sparkEventToJson(event: SparkListenerEvent): JValue = { + def sparkEventToJson( +event: SparkListenerEvent, +omitInternalAccums: Boolean = false, +omitUpdatedBlockStatuses: Boolean = false): JValue = { event match { --- End diff -- stageSubmitted/stageCompleted/jobStart should use `omitInternalAccums`, but not jobEnd. jobEnd's interface hasn't changed. `omitUpdatedBlockStatues` is intended to be only used for taskEnd because that's when updated block statuses are reported. Thanks for catching, I will add omitInternalAccums to stageSubmitted and jobStart. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r103565658 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -97,61 +100,80 @@ private[spark] object JsonProtocol { case logStart: SparkListenerLogStart => logStartToJson(logStart) case metricsUpdate: SparkListenerExecutorMetricsUpdate => -executorMetricsUpdateToJson(metricsUpdate) +executorMetricsUpdateToJson(metricsUpdate, omitInternalAccums) case blockUpdated: SparkListenerBlockUpdated => throw new MatchError(blockUpdated) // TODO(ekl) implement this case _ => parse(mapper.writeValueAsString(event)) } } - def stageSubmittedToJson(stageSubmitted: SparkListenerStageSubmitted): JValue = { -val stageInfo = stageInfoToJson(stageSubmitted.stageInfo) + def stageSubmittedToJson( +stageSubmitted: SparkListenerStageSubmitted, +omitInternalAccums: Boolean = false): JValue = { +val stageInfo = stageInfoToJson(stageSubmitted.stageInfo, omitInternalAccums) val properties = propertiesToJson(stageSubmitted.properties) ("Event" -> SPARK_LISTENER_EVENT_FORMATTED_CLASS_NAMES.stageSubmitted) ~ ("Stage Info" -> stageInfo) ~ ("Properties" -> properties) } - def stageCompletedToJson(stageCompleted: SparkListenerStageCompleted): JValue = { + def stageCompletedToJson( +stageCompleted: SparkListenerStageCompleted, +omitInternalAccums: Boolean = false): JValue = { val stageInfo = stageInfoToJson(stageCompleted.stageInfo) --- End diff -- Yes, thank you for catching it. I think it got omitted while I was merging stuff. Will fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r103564868 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -64,6 +64,12 @@ private[spark] class EventLoggingListener( private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) private val testing = sparkConf.getBoolean("spark.eventLog.testing", false) private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024 + // To reduce the size of event logs, we can omit logging all of internal accumulables for metrics. + private val omitInternalAccumulables = --- End diff -- @vanzin I added CPU time because back then I was pulling stage metrics from history server and needed CPU time. Here's the PR for the change: https://github.com/apache/spark/pull/10212 . Looking at the code, CPU time should be there, so I think there's something on my end. That's a separate problem though, and I don't think CPU time metric should increase size of event logs much. I can't think of a use case for internal accumulables then, so I think it makes sense to delete this. If anyone wants to use internal accumulables for stage metrics, they should be able to catch it after a stage finishes, not from History server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia have you had a chance to work on this issue and open a new PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r101442539 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -64,6 +64,12 @@ private[spark] class EventLoggingListener( private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) private val testing = sparkConf.getBoolean("spark.eventLog.testing", false) private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024 + // To reduce the size of event logs, we can omit logging all of internal accumulables for metrics. + private val omitInternalAccumulables = +sparkConf.getBoolean("spark.eventLog.omitInternalAccumulables", false) + // To reduce the size of event logs, we can omit logging "Updated Block Statuses" metric. + private val omitUpdatedBlockStatuses = +sparkConf.getBoolean("spark.eventLog.omitUpdatedBlockStatuses", false) --- End diff -- I am not sure if updated block statuses are used for the UI. At first, I was wondering if the information was used to reconstruct Storage page but I checked the usage of `TaskMetrics.updatedBlockStatues` and it doesn't seem to be used anywhere except for when the task metrics is converted to JSON object. Actually, I am not sure if Storage tab is working at all unless I am missing something. I don't think `/applications/[app-id]/storage/rdd` returns any meaningful information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r101438216 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -64,6 +64,12 @@ private[spark] class EventLoggingListener( private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) private val testing = sparkConf.getBoolean("spark.eventLog.testing", false) private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024 + // To reduce the size of event logs, we can omit logging all of internal accumulables for metrics. + private val omitInternalAccumulables = --- End diff -- I don't think this information is used to reconstruct job UI. I am not sure how this information got included in event logs, but I think some people might be using it to get internal metrics for a stage from the history server using its REST API. For example, CPU time metrics is not included in stage metrics you can get by querying history server endpoint `/applications/[app-id]/stages/[stage-id]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/16714 Not sure why the second test build failed at PySpark unit tests. I only changed the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...
GitHub user jisookim0513 opened a pull request: https://github.com/apache/spark/pull/16714 [SPARK-16333][Core] Enable EventLoggingListener to log less ## What changes were proposed in this pull request? Starting from Spark 2.0, task metrics are in the form of an accumulator. This is good but also causes excessive event logs because the metrics are logged twice (one under "Accumulators" and one under "Task Metrics"). For applications with lots of tasks, the size of event logs could be tens of GB and it is not feasible for Spark History Server to parse the logs and reconstruct the job UI. This PR adds an option for EventLoggingListener not to log internal accumulators that are for task metrics. It also adds an option not to log "Update Block Statuses" metric that is quite verbose and might not be needed on some occasions. After updating to Spark 2.0, a size of the event log of some application jumped from ~ 1GB to over 40 GB. With this patch, event log size went back to similar to the previous sizes with Spark 1.5.2. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/metamx/spark enable-less-eventlogs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16714 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin thanks, I was about to ask for a retest :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80184799 --- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala --- @@ -1097,7 +1100,9 @@ private[spark] object JsonProtocolSuite extends Assertions { | }, | "Task Metrics": { |"Executor Deserialize Time": 300, + |"Executor Deserialize CPU Time": 0, --- End diff -- Yeah I tested it on my testing cluster, but this makes sense. I will add non-zero CPU times by setting the CPU times same as given wall times. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80184744 --- Diff: core/src/test/resources/HistoryServerExpectations/complete_stage_list_json_expectation.json --- @@ -6,6 +6,7 @@ "numCompleteTasks" : 8, "numFailedTasks" : 0, "executorRunTime" : 162, + "executorCpuTime" : 0, --- End diff -- Oh no, these are expected outputs. I think the inputs are stored under `src/test/resources/spark-events`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80156532 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -759,7 +761,15 @@ private[spark] object JsonProtocol { return metrics } metrics.setExecutorDeserializeTime((json \ "Executor Deserialize Time").extract[Long]) +metrics.setExecutorDeserializeCpuTime((json \ "Executor Deserialize CPU Time") match { + case JNothing => 0 + case x => x.extract[Long]} +) metrics.setExecutorRunTime((json \ "Executor Run Time").extract[Long]) +metrics.setExecutorCpuTime((json \ "Executor CPU Time") match { + case JNothing => 0 + case x => x.extract[Long]} --- End diff -- Will fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80156386 --- Diff: core/src/test/resources/HistoryServerExpectations/complete_stage_list_json_expectation.json --- @@ -6,6 +6,7 @@ "numCompleteTasks" : 8, "numFailedTasks" : 0, "executorRunTime" : 162, + "executorCpuTime" : 0, --- End diff -- Hmm I thought HistoryServerSuite runs with included log files (that don't have CPU time). So this is an expected result since those logs don't have cpu time fields. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80155278 --- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala --- @@ -1097,7 +1100,9 @@ private[spark] object JsonProtocolSuite extends Assertions { | }, | "Task Metrics": { |"Executor Deserialize Time": 300, + |"Executor Deserialize CPU Time": 0, --- End diff -- AFAIK, JsonProtolSuite creates a JSON string from the event created by `makeTaskMetrics()`: 'makeTaskMetrics(300L, 400L, 500L, 600L, 700, 800, hasHadoopInput = true, hasOutput = false))'. I tried changing `makeTaskMetrics()` to accept deserialize CPU time and CPU time as arguments , but that ended up violating scalaStyle by having more than 10 parameters.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin could you merge this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin this PR had passed all tests. Could you merge it if I fix the recently introduced conflicts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin I updated the patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/10212 @vanzin sure will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r54951079 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -718,6 +719,7 @@ private[spark] object JsonProtocol { metrics.setHostname((json \ "Host Name").extract[String]) metrics.setExecutorDeserializeTime((json \ "Executor Deserialize Time").extract[Long]) metrics.setExecutorRunTime((json \ "Executor Run Time").extract[Long]) +metrics.setExecutorCpuTime((json \ "Executor CPU Time").extract[Long]) --- End diff -- Yeah it won't be able to deserialize a history from an earlier version. Would it be better to make this backward-compatible? (Sorry for the super late response) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add cpu time to metrics
GitHub user jisookim0513 opened a pull request: https://github.com/apache/spark/pull/10212 add cpu time to metrics Currently task metrics don't support executor CPU time, so there's no way to calculate how much CPU time a stage/task took from History Server metrics. This PR enables reporting CPU time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jisookim0513/spark add-cpu-time-metric Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10212 commit 30752cb9b3e91366fe2ac16ca769e8fc7e8dcf54 Author: jisookim Date: 2015-12-08T23:44:39Z add cpu time to metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org