[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320514#comment-17320514 ] Valentyn Tymofieiev commented on BEAM-7850: --- > Default implementation can be to use the hint values associated with values > defined on the composite. Actually using hints defined on subtransforms might make more sense as a default behavior when custom merging logic is not provided, because hints on subtransforms are defined in a more specific context. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317508#comment-17317508 ] Valentyn Tymofieiev commented on BEAM-7850: --- I think Beam SDK should ensure that by the time a RunnerAPI representation of a pipeline is created, Environment instances associated with subtransforms include the hints defined for the composite. The merging logic may need to be defined individually per hint. Default implementation can be to use the hint values associated with values defined on the composite. For int hints like min_ram_per_vcpu, one could take a max of the two values. Note that we initially plan express hints on transform level ( in Python - via .with_resource_hints() builder methods). Environment does not appear on this stage, and only appears in the picture when we translate the pipeline into RunnerAPI. In the future we may let users create Environment objects and attach them to transforms when they instantiate the pipeline. Currently this is not possible in Python, and AppliedPTransform.environment_id is not used anywhere as far as I can tell. If/when we allow users to explicitly define PTransforms with Envirionments, we should also ensure that the resource hints on composite transforms are properly propagated to the subtransforms. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268309#comment-17268309 ] Chamikara Madhusanka Jayalath commented on BEAM-7850: - I don't think all sub-transforms of a composite will have to have the same environment. For example, we might have a composite X that get expanded to sub-transforms Y and Z. If a runner choose to execute Y and Z they should be executed in corresponding environments. If a runner chooses to execute X, then the runner should override/specify the environment. I think, in general, for composites the environment_id field can be omitted. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268287#comment-17268287 ] Kenneth Knowles commented on BEAM-7850: --- Specifically "Environment" is about the execution environment of UDFs which is not a property of a composite at all, so this is some sort of summary field for the whole subgraph? What if the subgraph calls back into the other environment? > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268286#comment-17268286 ] Kenneth Knowles commented on BEAM-7850: --- I just found this because I stumbled across the change. Sorry to bring up a really old Jira. Since almost all PTransforms are composites, what is the spec for how the environment on a composite relates to environment on subtransforms? I guess you expect them all to be the same? The reason it came up is the use of annotations for both privacy properties and resource hints. Resource hints don't really make sense on a composite. But we had been saying that they should go on environment instead. But then environments got moved to composites so the whole discussion is confusing. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900413#comment-16900413 ] Luke Cwik commented on BEAM-7850: - Yes, we should replace SdkFunctionSpec with FunctionSpec everywhere it exists today and add the environment id to the WindowingStrategy (or leave it as an SdkFunctionSpec for now). > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900399#comment-16900399 ] Chamikara Jayalath commented on BEAM-7850: -- Thanks Luke. That makes sense. I assume we should also just replace SdkFunctionSpecs view_fn and window_mapping_fn in SideInput with FuctionSpec ? [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L958] Seems like SideInput lives within PTransform payloads (for example, ParDoPayload) so I don't think SideInput will need to have an Environment defined separately (in addition to the PTransform that holds it). > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900366#comment-16900366 ] Luke Cwik commented on BEAM-7850: - SdkFunctionSpec = FuctionSpec + environment I would suggest getting rid of SdkFunctionSpec completely in favor of FunctionSpec and add the environment id explicitly into the WindowingStrategy I migrated Coder to use FunctionSpec instead of SdkFunctionSpec since coders are implicitly understood by all neighboring PTransforms in BEAM-3204. Consider making this a subtask under BEAM-3221. Eventually in the future we may want to have SdkFunctionSpec appear again if we ever support first class functions outside of the context of a PTransform. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898260#comment-16898260 ] Maximilian Michels commented on BEAM-7850: -- Thanks for expanding on this. After taking a closer look, it makes perfect sense to move Environment to the top-level PTransform message because all other transforms are derived from this base message. You mentioned WindowStrategy and SideInput still using SdkFunctionSpec with Environment. There is also the Coder message. If I'm not mistaken, we could remove the environment from SdkFunctionSpec entirely because all these messages should be implicitly bound to an environment from the PTransform. Perhaps somebody else could comment on whether this would be feasible? I can't think of a situation where this is not the case. +1 Definitely seems like a sensible change. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897400#comment-16897400 ] Chamikara Jayalath commented on BEAM-7850: -- Thanks Max. I understand that this will be a bit invasive change. But I think it will simplify implementations quite a bit. So the main motivation is implementation simplicity. Currently to determine the environment where a given PTransform has to be executed, we have to fork for all possible payload types. Just to clarify, if we make Environment a top level attribute of Transform that will be enough for WindowInto transform as well, right ? Looking at Beam runner API proto, seems like only two messages that use SdkFunctionSpec and are not a payload of a certain type of PTransform are WindowingStrategy and SideInput. SideInput already seems to be within transform payload messages (ParDoPayload and WriteFilesPayload). We can keep environment around in SdkFunctionSpec if there are non-transform messages that need it. If we consider the cross-language expansion scenario, what we get from a remote environment is an expanded PTransform (along with coders, inputs, outputs, dependencies needed to execute it). So I think it makes sense to associate a PTransform with an Environment directly. > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897119#comment-16897119 ] Maximilian Michels commented on BEAM-7850: -- If we make Environment a top-level attribute of PTransform, do we also remove Environment from SdkFunctionSpec? There are other transforms like WindowInto which make use of SdkFunctionSpec and Environment. It seems nice to have the environment as a first-class citizen of PTransform, but it could be an invasive change considering it is used by multiple components of the Proto. Could you expand on the motivation for such a change? > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896289#comment-16896289 ] Chamikara Jayalath commented on BEAM-7850: -- cc: [~robertwb] [~lcwik] [~mxm] [~angoenka] > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chamikara Jayalath >Priority: Major > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian JIRA (v7.6.14#76016)