Hey Everyone: We've recently started to be permitted to reuse DoFn instances in Beam[1]. Beyond the efficiency gains from not having to deserialize new DoFn instances for every bundle, DoFn reuse also provides the ability to minimize expensive setup work done per-bundle, which hasn't formerly been possible. Additionally, it has also enabled more failure cases, where element-based state leaks improperly across bundles.
I've written a document proposing that two methods are added to the API of DoFn, setup and teardown, which both provides hooks for users to write efficient DoFns, as well as signals that DoFns will be reused. The document is located at https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f# and committers have edit access Thanks, Thomas [1] https://github.com/apache/incubator-beam/pull/419