[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2021-04-13 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320514#comment-17320514
 ] 

Valentyn Tymofieiev commented on BEAM-7850:
---

>  Default implementation can be to use the hint values associated with values 
> defined on the composite.

Actually using hints defined on subtransforms might make more sense as a 
default behavior when custom merging logic is not provided, because hints on 
subtransforms are defined in a more specific context.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2021-04-08 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317508#comment-17317508
 ] 

Valentyn Tymofieiev commented on BEAM-7850:
---

I think Beam SDK should ensure that by the time a RunnerAPI representation of a 
pipeline is created,  Environment instances associated with subtransforms 
include the hints defined for the composite. The merging logic may need to be 
defined individually per hint. Default implementation can be to use the hint 
values associated with values defined on the composite. For int hints like 
min_ram_per_vcpu, one could take a max of the two values. 

Note that we initially plan express hints on transform level ( in Python  - via 
.with_resource_hints() builder methods). Environment does not appear on this 
stage, and only appears in the picture when we translate the pipeline into 
RunnerAPI. In the future we may let users create Environment objects and attach 
them to transforms when they instantiate the pipeline. Currently this is not 
possible in Python, and AppliedPTransform.environment_id is not used anywhere 
as far as I can tell. If/when we allow users to explicitly define PTransforms 
with Envirionments, we should also ensure that the resource hints on composite 
transforms are properly propagated to the subtransforms. 

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2021-01-19 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268309#comment-17268309
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-7850:
-

I don't think all sub-transforms of a composite will have to have the same 
environment.

 

For example, we might have a composite X that get expanded to sub-transforms Y 
and Z.

If a runner choose to execute Y and Z they should be executed in corresponding 
environments. 

If a runner chooses to execute X, then the runner should override/specify the 
environment.

I think, in general, for composites the environment_id field can be omitted.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2021-01-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268287#comment-17268287
 ] 

Kenneth Knowles commented on BEAM-7850:
---

Specifically "Environment" is about the execution environment of UDFs which is 
not a property of a composite at all, so this is some sort of summary field for 
the whole subgraph? What if the subgraph calls back into the other environment?

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2021-01-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268286#comment-17268286
 ] 

Kenneth Knowles commented on BEAM-7850:
---

I just found this because I stumbled across the change. Sorry to bring up a 
really old Jira.

Since almost all PTransforms are composites, what is the spec for how the 
environment on a composite relates to environment on subtransforms? I guess you 
expect them all to be the same?

The reason it came up is the use of annotations for both privacy properties and 
resource hints. Resource hints don't really make sense on a composite. But we 
had been saying that they should go on environment instead. But then 
environments got moved to composites so the whole discussion is confusing.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-08-05 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900413#comment-16900413
 ] 

Luke Cwik commented on BEAM-7850:
-

Yes, we should replace SdkFunctionSpec with FunctionSpec everywhere it exists 
today and add the environment id to the WindowingStrategy (or leave it as an 
SdkFunctionSpec for now).

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-08-05 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900399#comment-16900399
 ] 

Chamikara Jayalath commented on BEAM-7850:
--

Thanks Luke. That makes sense.

I assume we should also just replace SdkFunctionSpecs view_fn and 
window_mapping_fn in SideInput  with FuctionSpec ? 

[https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L958]

Seems like SideInput lives within PTransform payloads (for example, 
ParDoPayload) so I don't think SideInput will need to have an Environment 
defined separately (in addition to the PTransform that holds it).

 

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-08-05 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900366#comment-16900366
 ] 

Luke Cwik commented on BEAM-7850:
-

SdkFunctionSpec = FuctionSpec + environment

I would suggest getting rid of SdkFunctionSpec completely in favor of 
FunctionSpec and add the environment id explicitly into the WindowingStrategy

I migrated Coder to use FunctionSpec instead of SdkFunctionSpec since coders 
are implicitly understood by all neighboring PTransforms in BEAM-3204.

Consider making this a subtask under BEAM-3221.

Eventually in the future we may want to have SdkFunctionSpec appear again if we 
ever support first class functions outside of the context of a PTransform.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-08-01 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898260#comment-16898260
 ] 

Maximilian Michels commented on BEAM-7850:
--

Thanks for expanding on this. After taking a closer look, it makes perfect 
sense to move Environment to the top-level PTransform message because all other 
transforms are derived from this base message.

You mentioned WindowStrategy and SideInput still using SdkFunctionSpec with 
Environment. There is also the Coder message. If I'm not mistaken, we could 
remove the environment from SdkFunctionSpec entirely because all these messages 
should be implicitly bound to an environment from the PTransform. Perhaps 
somebody else could comment on whether this would be feasible? I can't think of 
a situation where this is not the case.

+1 Definitely seems like a sensible change.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-07-31 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897400#comment-16897400
 ] 

Chamikara Jayalath commented on BEAM-7850:
--

Thanks Max. I understand that this will be a bit invasive change. But I think 
it will simplify implementations quite a bit. So the main motivation is 
implementation simplicity. Currently to determine the environment where a given 
PTransform has to be executed, we have to fork for all possible payload types.

Just to clarify, if we make Environment a top level attribute of Transform that 
will be enough for WindowInto transform as well, right ? 

Looking at Beam runner API proto, seems like only two messages that use 
SdkFunctionSpec and are not a payload of a certain type of PTransform are 
WindowingStrategy and SideInput. SideInput already seems to be within transform 
payload messages (ParDoPayload and WriteFilesPayload). We can keep environment 
around in SdkFunctionSpec if there are non-transform messages that need it.

If we consider the cross-language expansion scenario, what we get from a remote 
environment is an expanded PTransform (along with coders, inputs, outputs, 
dependencies needed to execute it). So I think it makes sense to associate a 
PTransform with an Environment directly.

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-07-31 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897119#comment-16897119
 ] 

Maximilian Michels commented on BEAM-7850:
--

If we make Environment a top-level attribute of PTransform, do we also remove 
Environment from SdkFunctionSpec? There are other transforms like WindowInto 
which make use of SdkFunctionSpec and Environment.

It seems nice to have the environment as a first-class citizen of PTransform, 
but it could be an invasive change considering it is used by multiple 
components of the Proto. Could you expand on the motivation for such a change?

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-07-30 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896289#comment-16896289
 ] 

Chamikara Jayalath commented on BEAM-7850:
--

cc: [~robertwb] [~lcwik] [~mxm] [~angoenka]

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chamikara Jayalath
>Priority: Major
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)