I wrote a spark instrumentation tool that instruments RDDs to give more fine-grain details on what is going on within a Task. This is working right now, but uses volatiles and CAS to pass around this state (which slows down the task). We want to lower the overhead of this and make the main call path single-threaded and pass around the result when the task competes; which sounds like AccumulatorV2.
I started rewriting the instrumented logic to be based off accumulators, but having a hard time getting these to show up in the UI/API (using this to see if I am linking things properly). So my question is as follows. When running in the executor and we create a accumulator (that was not created from SparkContext), how would I stitch things properly so it shows up with accumulators defined from the spark context? If this is different for different versions that is fine since we can figure that out quickly (hopefully) and change the instrumentation. Approches taken: Looked at SparkContext.register and copied the same logic, but at runtime this.hasNextTotal = new LongAccumulator(); this.hasNextTotal.metadata_$eq(new AccumulatorMetadata(AccumulatorContext.newId(), createName("hasNextTotal"), false)); AccumulatorContext.register(hasNextTotal); That didn't end up working tried getting the context from a SparkContext.getActive, but its not defined at runtime Option<SparkContext> opt = SparkContext$.MODULE$.getActive(); if (opt.isDefined()) { SparkContext sc = opt.get(); hasNextTotal.register(sc, Option.apply("hasNext"), false); nextTotal.register(sc, Option.apply("next"), false); } Any help on this would be very helpful! would really rather not re-implement the wheel if I can piggy-back off Accumulators. Thanks for your help! Target spark version: 2.2.0