[jira] [Updated] (SPARK-3051) Support looking-up named accumulators in a registry

2015-05-06 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3051:
-
Target Version/s:   (was: 1.2.0)

 Support looking-up named accumulators in a registry
 ---

 Key: SPARK-3051
 URL: https://issues.apache.org/jira/browse/SPARK-3051
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Neil Ferguson

 This is a proposed enhancement to Spark based on the following mailing list 
 discussion: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html.
 This proposal builds on SPARK-2380 (Support displaying accumulator values in 
 the web UI) to allow named accumulables to be looked-up in a registry, as 
 opposed to having to be passed to every method that need to access them.
 The use case was described well by [~shivaram], as follows:
 Lets say you have two functions you use 
 in a map call and want to measure how much time each of them takes. For 
 example, if you have a code block like the one below and you want to 
 measure how much time f1 takes as a fraction of the task. 
 {noformat}
 a.map { l = 
val f = f1(l) 
... some work here ... 
 } 
 {noformat}
 It would be really cool if we could do something like 
 {noformat}
 a.map { l = 
val start = System.nanoTime 
val f = f1(l) 
TaskMetrics.get(f1-time).add(System.nanoTime - start) 
 } 
 {noformat}
 SPARK-2380 provides a partial solution to this problem -- however the 
 accumulables would still need to be passed to every function that needs them, 
 which I think would be cumbersome in any application of reasonable complexity.
 The proposal, as suggested by [~pwendell], is to have a registry of 
 accumulables, that can be looked-up by name. 
 Regarding the implementation details, I'd propose that we broadcast a 
 serialized version of all named accumulables in the DAGScheduler (similar to 
 what SPARK-2521 does for Tasks). These can then be deserialized in the 
 Executor. 
 Accumulables are already stored in thread-local variables in the Accumulators 
 object, so exposing these in the registry should be simply a matter of 
 wrapping this object, and keying the accumulables by name (they are currently 
 keyed by ID).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3051) Support looking-up named accumulators in a registry

2015-02-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3051:
-
Affects Version/s: 1.0.0

 Support looking-up named accumulators in a registry
 ---

 Key: SPARK-3051
 URL: https://issues.apache.org/jira/browse/SPARK-3051
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Neil Ferguson

 This is a proposed enhancement to Spark based on the following mailing list 
 discussion: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html.
 This proposal builds on SPARK-2380 (Support displaying accumulator values in 
 the web UI) to allow named accumulables to be looked-up in a registry, as 
 opposed to having to be passed to every method that need to access them.
 The use case was described well by [~shivaram], as follows:
 Lets say you have two functions you use 
 in a map call and want to measure how much time each of them takes. For 
 example, if you have a code block like the one below and you want to 
 measure how much time f1 takes as a fraction of the task. 
 {noformat}
 a.map { l = 
val f = f1(l) 
... some work here ... 
 } 
 {noformat}
 It would be really cool if we could do something like 
 {noformat}
 a.map { l = 
val start = System.nanoTime 
val f = f1(l) 
TaskMetrics.get(f1-time).add(System.nanoTime - start) 
 } 
 {noformat}
 SPARK-2380 provides a partial solution to this problem -- however the 
 accumulables would still need to be passed to every function that needs them, 
 which I think would be cumbersome in any application of reasonable complexity.
 The proposal, as suggested by [~pwendell], is to have a registry of 
 accumulables, that can be looked-up by name. 
 Regarding the implementation details, I'd propose that we broadcast a 
 serialized version of all named accumulables in the DAGScheduler (similar to 
 what SPARK-2521 does for Tasks). These can then be deserialized in the 
 Executor. 
 Accumulables are already stored in thread-local variables in the Accumulators 
 object, so exposing these in the registry should be simply a matter of 
 wrapping this object, and keying the accumulables by name (they are currently 
 keyed by ID).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org