RE: [sql]enable spark sql cli support spark sql
If so, probably we need to add the SQL dialects switching support for SparkSQLCLI, as Fei suggested. What do you think the priority for this? -Original Message- From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, August 15, 2014 1:57 PM To: Cheng, Hao Cc: scwf; dev@spark.apache.org Subject: Re: [sql]enable spark sql cli support spark sql In the long run, as Michael suggested in his Spark Summit 14 talk, we'd like to implement SQL-92, maybe with the help of Optiq. On Aug 15, 2014, at 1:13 PM, Cheng, Hao hao.ch...@intel.com wrote: Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and only support some basic queries, not sure what's the plan for its enhancement. -Original Message- From: scwf [mailto:wangf...@huawei.com] Sent: Friday, August 15, 2014 11:22 AM To: dev@spark.apache.org Subject: [sql]enable spark sql cli support spark sql hi all, now spark sql cli only support spark hql, i think we can enable this cli to support spark sql, do you think it's necessary? -- Best Regards Fei Wang -- -- - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: -1s on pull requests?
On Sun, Aug 3, 2014 at 4:35 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. This is now captured in this JIRA issue https://issues.apache.org/jira/browse/SPARK-2912 and completed in this PR https://github.com/apache/spark/pull/1816 which has been merged in to master. Example of old style: tests starting https://github.com/apache/spark/pull/1819#issuecomment-51416510 / tests finished https://github.com/apache/spark/pull/1819#issuecomment-51417477 (with new classes) Example of new style: tests starting https://github.com/apache/spark/pull/1816#issuecomment-51855254 / tests finished https://github.com/apache/spark/pull/1816#issuecomment-51855255 (with new classes) Nick
Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing
I am still a bit confused that why this issue did not show up in 0.9...at that time there was no spark-submit and the context was constructed with low level calls... Kryo register for ALS was always in my application code.. Was this bug introduced in 1.0 or it was always there ? On Aug 14, 2014 5:48 PM, Reynold Xin r...@databricks.com wrote: Here: https://github.com/apache/spark/pull/1948 On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da...@gmail.com wrote: Is there a fix that I can test ? I have the flows setup for both standalone and YARN runs... Thanks. Deb On Thu, Aug 14, 2014 at 10:59 AM, Reynold Xin r...@databricks.com wrote: Yes, I understand it might not work for custom serializer, but that is a much less common path. Basically I want a quick fix for 1.1 release (which is coming up soon). I would not be comfortable making big changes to class path late into the release cycle. We can do that for 1.2. On Thu, Aug 14, 2014 at 2:35 AM, Graham Dennis graham.den...@gmail.com wrote: That should work, but would you also make these changes to the JavaSerializer? The API of these is the same so that you can select one or the other (or in theory a custom serializer)? This also wouldn't address the problem of shipping custom *serializers* (not kryo registrators) in user jars. On 14 August 2014 19:23, Reynold Xin r...@databricks.com wrote: Graham, SparkEnv only creates a KryoSerializer, but as I understand that serializer doesn't actually initializes the registrator since that is only called when newKryo() is called when KryoSerializerInstance is initialized. Basically I'm thinking a quick fix for 1.2: 1. Add a classLoader field to KryoSerializer; initialize new KryoSerializerInstance with that class loader 2. Set that classLoader to the executor's class loader when Executor is initialized. Then all deser calls should be using the executor's class loader. On Thu, Aug 14, 2014 at 12:53 AM, Graham Dennis graham.den...@gmail.com wrote: Hi Reynold, That would solve this specific issue, but you'd need to be careful that you never created a serialiser instance before the first task is received. Currently in Executor.TaskRunner.run a closure serialiser instance is created before any application jars are downloaded, but that could be moved. To me, this seems a little fragile. However there is a related issue where you can't ship a custom serialiser in an application jar because the serialiser is instantiated when the SparkEnv object is created, which is before any tasks are received by the executor. The above approach wouldn't help with this problem. Additionally, the YARN scheduler currently uses this approach of adding the application jar to the Executor classpath, so it would make things a bit more uniform. Cheers, Graham On 14 August 2014 17:37, Reynold Xin r...@databricks.com wrote: Graham, Thanks for working on this. This is an important bug to fix. I don't have the whole context and obviously I haven't spent nearly as much time on this as you have, but I'm wondering what if we always pass the executor's ClassLoader to the Kryo serializer? Will that solve this problem? On Wed, Aug 13, 2014 at 11:59 PM, Graham Dennis graham.den...@gmail.com wrote: Hi Deb, The only alternative serialiser is the JavaSerialiser (the default). Theoretically Spark supports custom serialisers, but due to a related issue, custom serialisers currently can't live in application jars and must be available to all executors at launch. My PR fixes this issue as well, allowing custom serialisers to be shipped in application jars. Graham On 14 August 2014 16:56, Debasish Das debasish.da...@gmail.com wrote: Sorry I just saw Graham's email after sending my previous email about this bug... I have been seeing this same issue on our ALS runs last week but I thought it was due my hacky way to run mllib 1.1 snapshot on core 1.0... What's the status of this PR ? Will this fix be back-ported to 1.0.1 as we are running 1.0.1 stable standalone cluster ? Till the PR merges does it make sense to not use Kryo ? What are the other recommended efficient serializers ? Thanks. Deb On Wed, Aug 13, 2014 at 2:47 PM, Graham Dennis graham.den...@gmail.com wrote: I now have a complete pull request for this issue that I'd like to get reviewed and committed. The PR is available here: https://github.com/apache/spark/pull/1890 and includes a testcase for the issue I described. I've also submitted a related PR ( https://github.com/apache/spark/pull/1827) that causes exceptions raised while attempting to run the custom kryo registrator not to be swallowed. Thanks, Graham On 12 August 2014 18:44, Graham Dennis graham.den...@gmail.com wrote: I've submitted a work-in-progress pull request for this issue that I'd like feedback on. See https://github.com/apache/spark/pull/1890 . I've
Tests failing
Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster
Thanks for your response. I think I misinterpreted the stability/compatibility guarantee with 1.0 release. It seems like the compatibility is only at the API level. This is interesting because it means any system/product that is built on top of Spark and uses Spark with a long-running SparkContext connecting to the cluster over network, will need to make sure it has the exact same version of Spark jar as the cluster, even to the patch version. This would be analogous to having to compile Spark against a very specific version of Hadoop, as opposed to currently being able to use the Spark package with CDH4 against most of the CDH4 Hadoop clusters. Is it correct that Spark is focusing and prioritizing around the spark-submit use cases than the aforementioned use cases? I just wanted to better understand the future direction/prioritization of spark. Thanks, Mingyu From: Patrick Wendell pwend...@gmail.com Date: Thursday, August 14, 2014 at 6:32 PM To: Gary Malouf malouf.g...@gmail.com Cc: Mingyu Kim m...@palantir.com, dev@spark.apache.org dev@spark.apache.org Subject: Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster I commented on the bug. For driver mode, you'll need to get the corresponding version of spark-submit for Spark 1.0.2. On Thu, Aug 14, 2014 at 3:43 PM, Gary Malouf malouf.g...@gmail.com wrote: To be clear, is it 'compiled' against 1.0.2 or it packaged with it? On Thu, Aug 14, 2014 at 6:39 PM, Mingyu Kim m...@palantir.com wrote: I ran a really simple code that runs with Spark 1.0.2 jar and connects to a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I filed the bug at https://issues.apache.org/jira/browse/SPARK-3050 https://urldefense.proofpoint.com/v1/url?u=https://issues.apache.org/jira/br owse/SPARK-3050k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0Ar=UKDOcu6qL3KsoZhpOohNBR1uc PNmWnbd3eEJ9hVUdMk%3D%0Am=qvQ59wZwD7EuezjTuLzmNTRUamDRDnI7%2F0%2BnULtXk4k%3D %0As=b7abf7638a3e6fac2ddac9d8f0ca52f1a92945465abfb2e2d996a96d2301fec5 . I assumed the minor and patch releases shouldn¹t break compatibility. Is that correct? Thanks, Mingyu smime.p7s Description: S/MIME cryptographic signature
Re: spark.akka.frameSize stalls job in 1.1.0
Hi Xiangrui, I wasn't setting spark.driver.memory. I'll try that and report back. In terms of this running on the cluster, my assumption was that calling foreach on an array(I converted samples using toArray) would mean counts is propagated locally. The object would then be serialized to executors fully propagated. Is this correct? I'm actually trying to load a trie and used the hashmap as an example of loading data into an object that needs to be serialized. Is there a better way of doing this? - jerry On Aug 15, 2014, at 8:36 AM, Xiangrui Meng [via Apache Spark Developers List] ml-node+s1001551n7866...@n3.nabble.com wrote: Did you set driver memory? You can confirm it in the Executors tab of the WebUI. Btw, the code may only work in local mode. In a cluster mode, counts will be serialized to remote workers and the result is not fetched by the driver after foreach. You can use RDD.countByValue instead. -Xiangrui On Fri, Aug 15, 2014 at 8:18 AM, jerryye [hidden email] wrote: Hi All, I'm not sure if I should file a JIRA or if I'm missing something obvious since the test code I'm trying is so simple. I've isolated the problem I'm seeing to a memory issue but I don't know what parameter I need to tweak, it does seem related to spark.akka.frameSize. If I sample my RDD with 35% of the data, everything runs to completion, with more than 35%, it fails. In standalone mode, I can run on the full RDD without any problems. // works val samples = sc.textFile(s3n://geonames).sample(false,0.35) // 64MB, 2849439 Lines // fails val samples = sc.textFile(s3n://geonames).sample(false,0.4) // 64MB, 2849439 Lines Any ideas? 1) RDD size is causing the problem. The code below as is fails but if I swap smallSample for samples, the code runs end to end on both cluster and standalone. 2) The error I get is: rg.apache.spark.SparkException: Job aborted due to stage failure: Task 3.0:1 failed 4 times, most recent failure: TID 12 on host ip-10-251-14-74.us-west-2.compute.internal failed for unknown reason Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) 3) Using the 1.1.0 branch the driver freezes instead of aborting with the previous error in #2. 4) In 1.1.0, changing spark.akka.frameSize also has the effect of no progress in the driver. Code: val smallSample = sc.parallelize(Array(foo word, bar word, baz word)) val samples = sc.textFile(s3n://geonames) // 64MB, 2849439 Lines of short strings val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0) samples.toArray.foreach(counts(_) += 1) val result = samples.map( l = (l, counts.get(l)) ) result.count Settings (with or without Kryo doesn't matter): export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g export SPARK_MEM=10g spark.akka.frameSize 40 #spark.serializer org.apache.spark.serializer.KryoSerializer #spark.kryoserializer.buffer.mb 1000 spark.executor.memory 58315m spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ spark.executor.extraClassPath /root/ephemeral-hdfs/conf -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] - To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7866.html To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1...@n3.nabble.com To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here. NAML -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7871.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Tests failing
Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: Tests failing
Shivaram, Can you point us to an example of that happening? The Jenkins console output, that is. Nick On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: spark.akka.frameSize stalls job in 1.1.0
Setting spark.driver.memory has no effect. It's still hanging trying to compute result.count when I'm sampling greater than 35% regardless of what value of spark.driver.memory I'm setting. Here's my settings: export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g export SPARK_MEM=10g in conf/spark-defaults: spark.driver.memory 1500 spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.mb 500 spark.executor.memory 58315m spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ spark.executor.extraClassPath /root/ephemeral-hdfs/conf -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Tests failing
Hey Nicholas, Yeah so Jenkins has it's own timeout mechanism and it will just kill the entire build after 120 minutes. But since run-tests is sitting in the middle of the tests, it can't actually post a failure message. I think run-tests-jenkins should just wrap the call to run-tests in a call in its own timeout. It might be possible to just use this: http://linux.die.net/man/1/timeout - Patrick On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: OK, I've captured this in SPARK-3076 https://issues.apache.org/jira/browse/SPARK-3076. Patrick, Is the problem that this run-tests https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151 step times out, and that is currently not handled gracefully? To be more specific, it hangs for 120 minutes, times out, but the parent script for some reason is also terminated. Does that sound right? Nick On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed out without notification. The relevant Jenkins logs are at https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Shivaram, Can you point us to an example of that happening? The Jenkins console output, that is. Nick On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: Tests failing
So 2 hours is a hard cap on how long a build can run. Okie doke. Perhaps then I'll wrap the run-tests step as you suggest and limit it to 100 minutes or something, and cleanly report if it times out. Sound good? On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nicholas, Yeah so Jenkins has it's own timeout mechanism and it will just kill the entire build after 120 minutes. But since run-tests is sitting in the middle of the tests, it can't actually post a failure message. I think run-tests-jenkins should just wrap the call to run-tests in a call in its own timeout. It might be possible to just use this: http://linux.die.net/man/1/timeout - Patrick On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: OK, I've captured this in SPARK-3076 https://issues.apache.org/jira/browse/SPARK-3076. Patrick, Is the problem that this run-tests https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151 step times out, and that is currently not handled gracefully? To be more specific, it hangs for 120 minutes, times out, but the parent script for some reason is also terminated. Does that sound right? Nick On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed out without notification. The relevant Jenkins logs are at https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Shivaram, Can you point us to an example of that happening? The Jenkins console output, that is. Nick On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: Tests failing
Yeah I was thinking something like that. Basically we should just have a variable for the timeout and I can make sure it's under the configured Jenkins time. On Fri, Aug 15, 2014 at 1:55 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So 2 hours is a hard cap on how long a build can run. Okie doke. Perhaps then I'll wrap the run-tests step as you suggest and limit it to 100 minutes or something, and cleanly report if it times out. Sound good? On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nicholas, Yeah so Jenkins has it's own timeout mechanism and it will just kill the entire build after 120 minutes. But since run-tests is sitting in the middle of the tests, it can't actually post a failure message. I think run-tests-jenkins should just wrap the call to run-tests in a call in its own timeout. It might be possible to just use this: http://linux.die.net/man/1/timeout - Patrick On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: OK, I've captured this in SPARK-3076 https://issues.apache.org/jira/browse/SPARK-3076. Patrick, Is the problem that this run-tests https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151 step times out, and that is currently not handled gracefully? To be more specific, it hangs for 120 minutes, times out, but the parent script for some reason is also terminated. Does that sound right? Nick On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed out without notification. The relevant Jenkins logs are at https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Shivaram, Can you point us to an example of that happening? The Jenkins console output, that is. Nick On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch for now to see if it brings the build times back down. - Patrick
Re: Dynamic variables in Spark
I've opened SPARK-3051 (https://issues.apache.org/jira/browse/SPARK-3051) based on this thread. Neil On Thu, Jul 24, 2014 at 10:30 PM, Neil Ferguson nfergu...@gmail.com wrote: That would work well for me! Do you think it would be necessary to specify which accumulators should be available in the registry, or would we just broadcast all named accumulators registered in SparkContext and make them available in the registry? Anyway, I'm happy to make the necessary changes (unless someone else wants to). On Thu, Jul 24, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com wrote: What if we have a registry for accumulators, where you can access them statically by name? - Patrick On Thu, Jul 24, 2014 at 1:51 PM, Neil Ferguson nfergu...@gmail.com wrote: I realised that my last reply wasn't very clear -- let me try and clarify. The patch for named accumulators looks very useful, however in Shivaram's example he was able to retrieve the named task metrics (statically) from a TaskMetrics object, as follows: TaskMetrics.get(f1-time) However, I don't think this would be possible with the named accumulators -- I believe they'd need to be passed to every function that needs them, which I think would be cumbersome in any application of reasonable complexity. This is what I was trying to solve with my proposal for dynamic variables in Spark. However, the ability to retrieve named accumulators from a thread-local would work just as well for my use case. I'd be happy to implement either solution if there's interest. Alternatively, if I'm missing some other way to accomplish this please let me know. On a (slight) aside, I now think it would be possible to implement dynamic variables by broadcasting them. I was looking at Reynold's PR [1] to broadcast the RDD object, and I think it would be possible to take a similar approach -- that is, broadcast the serialized form, and deserialize when executing each task. [1] https://github.com/apache/spark/pull/1498 On Wed, Jul 23, 2014 at 8:30 AM, Neil Ferguson nfergu...@gmail.com wrote: Hi Patrick. That looks very useful. The thing that seems to be missing from Shivaram's example is the ability to access TaskMetrics statically (this is the same problem that I am trying to solve with dynamic variables). You mention defining an accumulator on the RDD. Perhaps I am missing something here, but my understanding was that accumulators are defined in SparkContext and are not part of the RDD. Is that correct? Neil On Tue, Jul 22, 2014 at 22:21, Patrick Wendell pwend...@gmail.com =mailto:pwend...@gmail.com; wrote: Shivaram, You should take a look at this patch which adds support for naming accumulators - this is likely to get merged in soon. I actually started this patch by supporting named TaskMetrics similar to what you have there, but then I realized there is too much semantic overlap with accumulators, so I just went that route. For instance, it would be nice if any user-defined metrics are accessible at the driver program. https://github.com/apache/spark/pull/1309 In your example, you could just define an accumulator here on the RDD and you'd see the incremental update in the web UI automatically. - Patrick On Tue, Jul 22, 2014 at 2:09 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: From reading Neil's first e-mail, I think the motivation is to get some metrics in ADAM ? -- I've run into a similar use-case with having user-defined metrics in long-running tasks and I think a nice way to solve this would be to have user-defined TaskMetrics. To state my problem more clearly, lets say you have two functions you use in a map call and want to measure how much time each of them takes. For example, if you have a code block like the one below and you want to measure how much time f1 takes as a fraction of the task. a.map { l = val f = f1(l) ... some work here ... } It would be really cool if we could do something like a.map { l = val start = System.nanoTime val f = f1(l) TaskMetrics.get(f1-time).add(System.nanoTime - start) } These task metrics have a different purpose from Accumulators in the sense that we don't need to track lineage, perform commutative operations etc. Further we also have a bunch of code in place to aggregate task metrics across a stage etc. So it would be great if we could also populate these in the UI and show median/max etc. I think counters [1] in Hadoop served a similar purpose. Thanks Shivaram [1] https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/counters On Tue, Jul 22, 2014 at 1:43 PM, Neil Ferguson nfergu...@gmail.com wrote: Hi Reynold Thanks for your reply. Accumulators are, of course, stored in