[jira] [Commented] (BEAM-452) Implement DoFn per-instance setup and teardown methods

2016-08-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421690#comment-15421690
 ] 

ASF GitHub Bot commented on BEAM-452:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/690


> Implement DoFn per-instance setup and teardown methods
> --
>
> Key: BEAM-452
> URL: https://issues.apache.org/jira/browse/BEAM-452
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, runner-direct, runner-flink, 
> runner-spark, sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit
> BEAM-38 permits DoFns to be reused across bundles. DoFn instances may need to 
> do per-instance setup and teardown, and to avoid redoing the work per-bundle, 
> the system should provide hooks to call before a DoFn is first used and after 
> it will no longer be used.
> DoFn#setup is called before any other calls to DoFn methods. DoFn#teardown is 
> called after any method throws an exception, or when the runner will no 
> longer use a DoFn instance (e.g. when it evicts it from a cache).
> Runners must call these methods appropriately in all cases (including if a 
> DoFn is used exactly once, for a single bundle, and discarded).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-452) Implement DoFn per-instance setup and teardown methods

2016-07-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384666#comment-15384666
 ] 

ASF GitHub Bot commented on BEAM-452:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/690

[BEAM-452] Add DoFn setup and teardown methods

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam dofn_setup_teardown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #690


commit d7c4440d23278135c86b193c2d25ac512d5aa5d2
Author: Thomas Groh 
Date:   2016-06-28T22:44:49Z

Use the ParDo Application to Cache DoFns

A DoFn application is the scope of reuse.

Factor CloningThreadLocal as the top-level class instead of
SerializableCloningThreadLocalCacheLoader, and extract the Fn from the
AppliedPTransform when loading an absent element.

commit 6f7d10e303a0cb3d86ad0f2c60db5ed1918420d1
Author: Thomas Groh 
Date:   2016-07-15T17:51:24Z

Make TransformEvaluatorFactory reuse Explicit

Transform Evaluator Factories must be reused for the entire execution of
a Pipeline and must not be reused across pipelines.

Remove EvaluatorKey, and key explicitly by the transform application.

commit f2c0ba67920ba2e2772ddacc808c5adf38949bc7
Author: Thomas Groh 
Date:   2016-07-15T18:27:00Z

Add TransformEvaluatorFactory#cleanup

This cleans up any state stored within the Transform Evaluator Factory.

commit 1f35c4b64aae264d800326421db475be260de2c9
Author: Thomas Groh 
Date:   2016-07-14T21:51:02Z

Add DoFn#setup and DoFn#teardown

These methods are called to do expensive setup work, and to clean up a
DoFn before it is discarded.

commit 797633a2209a59736650e255be517ec73137e94d
Author: Thomas Groh 
Date:   2016-07-19T18:03:15Z

Replace CloningThreadLocal with DoFnLifecycleManager

This is a more focused interface that interacts with a DoFn before it
is available for use and after it has completed and the reference is
lost. It is required to properly support setup and teardown, as the
fields in a ThreadLocal cannot all be cleaned up without additional
tracking.

Part of BEAM-452.

commit 7bf0b4185d8303b03d47fb99691fd63ae57ad887
Author: Thomas Groh 
Date:   2016-07-19T18:08:18Z

fixup! Add DoFn#setup and DoFn#teardown

Handle DoFn setup and teardown in DoFnLifecycleManager

This ensures that the DirectRunner properly interacts with DoFn setup
and teardown methods.

commit 9d1b2c142aff0cb638c027567dda18169b2f8795
Author: Thomas Groh 
Date:   2016-07-19T18:06:21Z

fixup! Add DoFn#setup and DoFn#teardown

Call DoFn#setup and #teardown in Flink and Spark




> Implement DoFn per-instance setup and teardown methods
> --
>
> Key: BEAM-452
> URL: https://issues.apache.org/jira/browse/BEAM-452
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, runner-direct, runner-flink, 
> runner-spark, sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit
> BEAM-38 permits DoFns to be reused across bundles. DoFn instances may need to 
> do per-instance setup and teardown, and to avoid redoing the work per-bundle, 
> the system should provide hooks to call before a DoFn is first used and after 
> it will no longer be used.
> DoFn#setup is called before any other calls to DoFn methods. DoFn#teardown is 
> called after any method throws an exception, or when the runner will no 
> longer use a DoFn instance (e.g. when it evicts it from a cache).
> Runners must call these methods appropriately in all cases (including if a 
> DoFn is used exactly once, for a single bundle, and discarded).