Thanks much for the suggestion. Reading over your mail though, I
realize that I may not have made something clear: I don't just have a
single external service object; the idea is that I have one per
executor, so that the code running on each executor accesses the
external service independently.
Thanks much for the suggestion. Reading over your mail though, I realize
that I may not have made something clear: I don't just have a single
external service object; the idea is that I have one per executor, so that
the code running on each executor accesses the external service
independently.
It looks like the clean up should go into the foreachRDD function:
stateUpdateStream.foreachRdd(...) { rdd =>
// do stuff with the rdd
stateUpdater.cleanupExternalService// should work in this position
}
Code within the foreachRDD(*) executes on the driver, so you can keep the
state of
We have some code we've written using stateful streaming (mapWithState)
which works well for the most part. The stateful streaming runs, processes
the RDD of input data, calls the state spec function for each input record,
and does all proper adding and removing from the state cache.
However, I