[ 
https://issues.apache.org/jira/browse/BEAM-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frances Perry reassigned BEAM-2450:
-----------------------------------

    Assignee:     (was: Frances Perry)

> Transform names and named applications should not be null or empty
> ------------------------------------------------------------------
>
>                 Key: BEAM-2450
>                 URL: https://issues.apache.org/jira/browse/BEAM-2450
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model, sdk-java-core, sdk-py
>            Reporter: Scott Wegner
>            Priority: Minor
>
> Beam SDK allows setting the name of a transform [1] and also naming the 
> transform application [2]. If no name is specified on application, the name 
> of the transform is used. If no name is specified for the transform, the 
> class name is used.
> The application name serves as metadata for the applied PTransforms in the 
> constructed graph. The are effectively extra display data (historically, 
> PTransform names predate display data). The names are used by runners for UI 
> and monitoring applications, such as the displayed pipeline graph in the 
> Dataflow Monitoring UI [3].
> Currently there is no explicit validation on the specified application name. 
> The current behavior seems to be:
> * null application names cause a NullPointerException at construction time.
> * Specifying the empty string compiles and succeeds in the DirectRunner, but 
> causes strange behavior in Dataflow when rendering the graph in the UI. I 
> have not tested the behavior of other runners.
> We should add explicit validation in the model on the specified transform 
> name and application name. I propose that we disallow null and empty names.
> This is technically a breaking change as the SDK currently allows the empty 
> string, but only because it is under-specified. The upgrade path for any 
> pipelines broken by this change is simple: specify a non-empty name or 
> fallback to the default class name.
> [1] 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236
> [2] 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295
> [3] 
> https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to