I wrote a spark instrumentation tool that instruments RDDs to give more
fine-grain details on what is going on within a Task.  This is working
right now, but uses volatiles and CAS to pass around this state (which
slows down the task).  We want to lower the overhead of this and make the
main call path single-threaded and pass around the result when the task
competes; which sounds like AccumulatorV2.

I started rewriting the instrumented logic to be based off accumulators,
but having a hard time getting these to show up in the UI/API (using this
to see if I am linking things properly).

So my question is as follows.

When running in the executor and we create a accumulator (that was not
created from SparkContext), how would I stitch things properly so it shows
up with accumulators defined from the spark context?  If this is different
for different versions that is fine since we can figure that out quickly
(hopefully) and change the instrumentation.

Approches taken:

Looked at SparkContext.register and copied the same logic, but at runtime

this.hasNextTotal = new LongAccumulator();
this.hasNextTotal.metadata_$eq(new
AccumulatorMetadata(AccumulatorContext.newId(),
createName("hasNextTotal"), false));
AccumulatorContext.register(hasNextTotal);


That didn't end up working

tried getting the context from a SparkContext.getActive, but its not
defined at runtime


Option<SparkContext> opt = SparkContext$.MODULE$.getActive();
if (opt.isDefined()) {
    SparkContext sc = opt.get();
    hasNextTotal.register(sc, Option.apply("hasNext"), false);
    nextTotal.register(sc, Option.apply("next"), false);
}


Any help on this would be very helpful! would really rather not
re-implement the wheel if I can piggy-back off Accumulators.

Thanks for your help!

Target spark version: 2.2.0

Reply via email to