Kip Kohn created GOBBLIN-2188:
---------------------------------
Summary: Allow use of stateful writer/converter `Initializer`s
with Gobblin-on-Temporal
Key: GOBBLIN-2188
URL: https://issues.apache.org/jira/browse/GOBBLIN-2188
Project: Apache Gobblin
Issue Type: New Feature
Components: gobblin-api
Reporter: Kip Kohn
Assignee: Hung Tran
Stateful writer/converter `Initializer`s, such as `JdbcWriterInitializer`, work
fine with Gobblin-on-MR, but get disrupted by GoT. While GoMR does also launch
an MR application, the remainder of the `MRJobLauncher` execution is within the
same process. `Initializer`s must execute at the end of WorkDiscovery, before
`WorkUnit` processing may begin, but are `.close()`d only after Job Commit
completes. Crucially, with GoMR, the same `Initializer` instances remain in
memory all throughout. With GoT, in contrast, Work Discovery and Commit
execute completely independently - creating new objects/instances, perhaps even
on a new host/container.
Problem: Some `Initializer`s, such as the `JdbcWriter`'s
`JdbcWriterInitializer` are stateful. (In its case, to maintain the
temp/staging table's name, so that may be dropped upon successful Commit.)
Specific state originates during Work Discovery (the `GenerateWorkUnitsImpl`
activity in GoT) yet must be available during Commit (the `CommitActivityImpl`
in GoT).
Solution: Use the Memento (GoF) Pattern to enable `Initializer`s to convey
arbitrary state from one concrete `Initializer` instance to another of the same
type. Leverage the `JobState` to tunnel mementos, since it is serialized at
the end of Work Discovery, to be loaded later as the Commit activity begins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)