[
https://issues.apache.org/jira/browse/BEAM-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549952#comment-17549952
]
Danny McCormick commented on BEAM-14284:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/21522
> Server-side Dataflow job idempotence
> ------------------------------------
>
> Key: BEAM-14284
> URL: https://issues.apache.org/jira/browse/BEAM-14284
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: tol
> Priority: P2
>
> *Issue*: when a job submission is retried, it may result in duplicate
> Dataflow jobs. The Dataflow job {{name}} only guarantees uniqueness for
> _active_ jobs -- that is, if a job with the same name exists but is already
> completed, the same {{name}} is allowed again. What we would like is job
> uniqueness regardless of job status.
> The Dataflow API provides a way to ensure unique jobs through the use of
> {{clientRequestId}}:
> {code:java}
> The client's unique identifier of the job, re-used
> across retried attempts. If this field is set, the service will ensure
> its uniqueness. The request to create a job will fail if the service has
> knowledge of a previously submitted job with the same client's ID and
> job name. The caller may use this field to ensure idempotence of job
> creation across retried attempts to create a job. By default, the field
> is empty and, in that case, the service ignores it. {code}
> [https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.jobs]
> In DataflowRunner.java, {{clientRequestId}} is set with [a randomized
> value|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1125].
> *Proposed solution*: provide the ability to pass in a {{clientRequestId}}
> through {{DataflowPipelineOptions}} and set it on the {{Job}} when available,
> otherwise default to the randomized value.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)