[jira] [Updated] (SPARK-3051) Support looking-up named accumulators in a registry
[ https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3051: - Target Version/s: (was: 1.2.0) Support looking-up named accumulators in a registry --- Key: SPARK-3051 URL: https://issues.apache.org/jira/browse/SPARK-3051 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Neil Ferguson This is a proposed enhancement to Spark based on the following mailing list discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html. This proposal builds on SPARK-2380 (Support displaying accumulator values in the web UI) to allow named accumulables to be looked-up in a registry, as opposed to having to be passed to every method that need to access them. The use case was described well by [~shivaram], as follows: Lets say you have two functions you use in a map call and want to measure how much time each of them takes. For example, if you have a code block like the one below and you want to measure how much time f1 takes as a fraction of the task. {noformat} a.map { l = val f = f1(l) ... some work here ... } {noformat} It would be really cool if we could do something like {noformat} a.map { l = val start = System.nanoTime val f = f1(l) TaskMetrics.get(f1-time).add(System.nanoTime - start) } {noformat} SPARK-2380 provides a partial solution to this problem -- however the accumulables would still need to be passed to every function that needs them, which I think would be cumbersome in any application of reasonable complexity. The proposal, as suggested by [~pwendell], is to have a registry of accumulables, that can be looked-up by name. Regarding the implementation details, I'd propose that we broadcast a serialized version of all named accumulables in the DAGScheduler (similar to what SPARK-2521 does for Tasks). These can then be deserialized in the Executor. Accumulables are already stored in thread-local variables in the Accumulators object, so exposing these in the registry should be simply a matter of wrapping this object, and keying the accumulables by name (they are currently keyed by ID). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3051) Support looking-up named accumulators in a registry
[ https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3051: - Affects Version/s: 1.0.0 Support looking-up named accumulators in a registry --- Key: SPARK-3051 URL: https://issues.apache.org/jira/browse/SPARK-3051 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Neil Ferguson This is a proposed enhancement to Spark based on the following mailing list discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html. This proposal builds on SPARK-2380 (Support displaying accumulator values in the web UI) to allow named accumulables to be looked-up in a registry, as opposed to having to be passed to every method that need to access them. The use case was described well by [~shivaram], as follows: Lets say you have two functions you use in a map call and want to measure how much time each of them takes. For example, if you have a code block like the one below and you want to measure how much time f1 takes as a fraction of the task. {noformat} a.map { l = val f = f1(l) ... some work here ... } {noformat} It would be really cool if we could do something like {noformat} a.map { l = val start = System.nanoTime val f = f1(l) TaskMetrics.get(f1-time).add(System.nanoTime - start) } {noformat} SPARK-2380 provides a partial solution to this problem -- however the accumulables would still need to be passed to every function that needs them, which I think would be cumbersome in any application of reasonable complexity. The proposal, as suggested by [~pwendell], is to have a registry of accumulables, that can be looked-up by name. Regarding the implementation details, I'd propose that we broadcast a serialized version of all named accumulables in the DAGScheduler (similar to what SPARK-2521 does for Tasks). These can then be deserialized in the Executor. Accumulables are already stored in thread-local variables in the Accumulators object, so exposing these in the registry should be simply a matter of wrapping this object, and keying the accumulables by name (they are currently keyed by ID). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org