Expecting runners to populate, or override, SDK-level pipeline options isn't a great thing, particularly in a scenario that would affect correctness.
The main thing is discoverability of a subtle API like this -- there's little chance somebody writing a new runner would stumble across this and do the right thing. It would be much better to make expectations from a runner clear, say, via a runner-provided "context" API. I'd stay away from a pipeline option with a default value. The other contentions topic here is the usage of a job-level or execution-level identifier. This easily becomes ambiguous in the presence of Flink's savepoints, Dataflow's update, fast re-execution, canary vs. production pipeline, cross-job optimizations, etc. I think we'd be better off with a transform-level nonce than a job-level one. Finally, the real solution is to enhance the model and make such a functionality available to everyone, e.g., roughly "init" + "checkpoint" + "side-input to source / splittabledofn / composable io". -- Practically, to solve the problem at hand quickly, I'd be in favor of a context-based approach. On Thu, Jan 19, 2017 at 10:22 AM, Sam McVeety <[email protected]> wrote: > Hi folks, I'm looking for feedback on whether the following is a reasonable > approach to handling ValueProviders that are intended to be populated at > runtime by a given Runner (e.g. a Dataflow job ID, which is a GUID from the > service). Two potential pieces of a solution: > > 1. Annotate such parameters with @RunnerProvided, which results in an > Exception if the user manually tries to set the parameter. > > 2. Allow for a DefaultValueFactory to be present for the set of Runners > that do not override the parameter. > > Best, > Sam >
