Hey Everyone:

We've recently started to be permitted to reuse DoFn instances in Beam[1].
Beyond the efficiency gains from not having to deserialize new DoFn
instances for every bundle, DoFn reuse also provides the ability to
minimize expensive setup work done per-bundle, which hasn't formerly been
possible. Additionally, it has also enabled more failure cases, where
element-based state leaks improperly across bundles.

I've written a document proposing that two methods are added to the API of
DoFn, setup and teardown, which both provides hooks for users to write
efficient DoFns, as well as signals that DoFns will be reused.

The document is located at
https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#
and committers have edit access

Thanks,

Thomas

[1] https://github.com/apache/incubator-beam/pull/419

Reply via email to