[
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347682
]
ASF GitHub Bot logged work on BEAM-7850:
----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Nov/19 20:52
Start Date: 21/Nov/19 20:52
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #10183: [BEAM-7850]
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#discussion_r349311598
##########
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##########
@@ -698,10 +700,10 @@ message StandardCoders {
// TODO: consider inlining field on PCollection
message WindowingStrategy {
- // (Required) The SdkFunctionSpec of the UDF that assigns windows,
+ // (Required) The FunctionSpec of the UDF that assigns windows,
// merges windows, and shifts timestamps before they are
// combined according to the OutputTime.
- SdkFunctionSpec window_fn = 1;
+ FunctionSpec window_fn = 1;
Review comment:
All other uses of SdkFunctionSpec are part of a transform but finding the
envrionment for the windowing strategy that the window_fn is part of when
looking at a PCollection is difficult since one needs to find the environment
that the upstream assign windows transform is part of. It is possible but
annoying. Also this is different then coders which don't have an environment
since both the upstream and downstream transforms need to understand the
encoding and runners have the length prefixing technique to convert coders from
unknown ones to known ones.
For now I'm for not adding an environment id here and forcing that graph
traversal for the assign windows transform to find the environment and in the
future to truly support custom window fns its likely we will have to come up
with techniques like the coder length prefixing to make it so that they can be
treated opaquely. (note that rest of the change looks good)
@chamikaramj @robertwb what do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 347682)
Time Spent: 1h (was: 50m)
> Make Environment a top level attribute of PTransform
> ----------------------------------------------------
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
> Issue Type: Sub-task
> Components: beam-model
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Chamikara Madhusanka Jayalath
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo,
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>
> This makes tracking environment of different types of PTransforms harder and
> we have to fork code (on the type of PTransform) to extract the Environment
> where the PTransform should be executed. It will probably be simpler to just
> make Environment a top level attribute of PTransform.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)