Hello people,

I'm working on a fix for SPARK-33000
<https://issues.apache.org/jira/browse/SPARK-33000>. Spark does not cleanup
checkpointed RDDs/DataFrames on shutdown, even if the appropriate configs
are set.

In the course of developing a fix, another contributor pointed out
<https://github.com/apache/spark/pull/31742#issuecomment-790987483> that
checkpointed data may not be the only type of resource that needs a fix for
shutdown cleanup.

I'm looking for a committer who might have an opinion on how Spark should
clean up disk-based resources on shutdown. The last people who contributed
significantly to the ContextCleaner, where this cleanup happens, were @witgo
<https://github.com/witgo> and @andrewor14 <https://github.com/andrewor14>.
But that was ~6 years ago, and I don't think they are active on the project
anymore.

Any takers to take a look and give their thoughts? The PR is small
<https://github.com/apache/spark/pull/31742>. +39 / -2.

Nick

Reply via email to