> Is this something that would be exposed/relevant to the Python API? Or is this just for people implementing their own Spark data source?
It's latter, and it also helps simplifying built-in data sources as well (as I found the needs while working on https://github.com/apache/spark/pull/26845) On Thu, Dec 12, 2019 at 3:53 AM Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Is this something that would be exposed/relevant to the Python API? Or is > this just for people implementing their own Spark data source? > > On Wed, Dec 11, 2019 at 12:35 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Hi devs, >> >> I'd like to propose to add close() on DataWriter explicitly, which is the >> place for resource cleanup. >> >> The rationalization of the proposal is due to the lifecycle of >> DataWriter. If the scaladoc of DataWriter is correct, the lifecycle of >> DataWriter instance ends at either commit() or abort(). That makes >> datasource implementors to feel they can place resource cleanup in both >> sides, but abort() can be called when commit() fails; so they have to >> ensure they don't do double-cleanup if cleanup is not idempotent. >> >> I've checked some callers to see whether they can apply >> "try-catch-finally" to ensure close() is called at the end of lifecycle for >> DataWriter, and they look like so, but I might be missing something. >> >> What do you think? It would bring backward incompatible change, but given >> the interface is marked as Evolving and we're making backward incompatible >> changes in Spark 3.0, so I feel it may not matter. >> >> Would love to hear your thoughts. >> >> Thanks in advance, >> Jungtaek Lim (HeartSaVioR) >> >> >>