[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=92220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92220 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 18/Apr/18 19:30 Start Date: 18/Apr/18 19:30 Worklog Time Spent: 10m Work Description: szewi commented on issue #5170: [BEAM-4056] Basic performance tests analysis added. URL: https://github.com/apache/beam/pull/5170#issuecomment-382502384 You are right. It supposed to be https://issues.apache.org/jira/browse/BEAM-4065 , however this build contains new jenkins job I would like to test. Let me wait for finishing `Jenkins: Seed Job`, run tests to make sure the configuration here is valid and I will close this PR. So far I will update description. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 92220) Time Spent: 3h (was: 2h 50m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=92219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92219 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 18/Apr/18 19:24 Start Date: 18/Apr/18 19:24 Worklog Time Spent: 10m Work Description: bsidhom commented on issue #5170: [BEAM-4056] Basic performance tests analysis added. URL: https://github.com/apache/beam/pull/5170#issuecomment-382500871 I assume you tagged this with BEAM-4056 by mistake. I'm planning to mark that bug as closed now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 92219) Time Spent: 2h 50m (was: 2h 40m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=92214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92214 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 18/Apr/18 18:53 Start Date: 18/Apr/18 18:53 Worklog Time Spent: 10m Work Description: szewi opened a new pull request #5170: [BEAM-4056] Basic performance tests analysis added. URL: https://github.com/apache/beam/pull/5170 DESCRIPTION HERE Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand: - [ ] What the pull request does - [ ] Why it does it - [ ] How it does it - [ ] Why this approach - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 92214) Time Spent: 2.5h (was: 2h 20m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=92215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92215 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 18/Apr/18 18:53 Start Date: 18/Apr/18 18:53 Worklog Time Spent: 10m Work Description: szewi commented on issue #5170: [BEAM-4056] Basic performance tests analysis added. URL: https://github.com/apache/beam/pull/5170#issuecomment-382491998 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 92215) Time Spent: 2h 40m (was: 2.5h) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=91430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91430 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 16/Apr/18 18:36 Start Date: 16/Apr/18 18:36 Worklog Time Spent: 10m Work Description: tgroh closed pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto b/model/pipeline/src/main/proto/beam_runner_api.proto index 83bb8a29f45..2bc2e34951c 100644 --- a/model/pipeline/src/main/proto/beam_runner_api.proto +++ b/model/pipeline/src/main/proto/beam_runner_api.proto @@ -217,12 +217,12 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // Side Input PCollection ids. Each must be present as a value in the inputs of - // any PTransform the ExecutableStagePayload is the payload of. - repeated string side_inputs = 3; + // The side inputs required for this executable stage. Each Side Input of each PTransform within + // this ExecutableStagePayload must be represented within this field. + repeated SideInputId side_inputs = 3; // PTransform ids contained within this executable stage. This must contain at least one - // PTransform ID. + // PTransform id. repeated string transforms = 4; // Output PCollection ids. This must be equal to the values of the outputs of any @@ -232,6 +232,16 @@ message ExecutableStagePayload { // (Required) The components for the Executable Stage. This must contain all of the Transforms // in transforms, and the closure of all of the components they recognize. Components components = 6; + + // A reference to a side input. Side inputs are uniquely identified by PTransform id and + // local name. + message SideInputId { +// (Required) The id of the PTransform that references this side input. +string transform_id = 1; + +// (Required) The local name of this side input from the PTransform that references it. +string local_name = 2; + } } // The payload for the primitive ParDo transform. diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java index c41d0b8b587..50a1c9e1539 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java @@ -25,6 +25,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; @@ -77,7 +78,7 @@ * Returns the set of {@link PCollectionNode PCollections} that will be accessed by this {@link * ExecutableStage} as side inputs. */ - Collection getSideInputPCollections(); + Collection getSideInputs(); /** * Returns the leaf {@link PCollectionNode PCollections} of this {@link ExecutableStage}. @@ -122,11 +123,16 @@ default PTransform toPTransform() { pt.putInputs("input", getInputPCollection().getId()); payload.setInput(input.getId()); -int sideInputIndex = 0; -for (PCollectionNode sideInputNode : getSideInputPCollections()) { - pt.putInputs(String.format("side_input_%s", sideInputIndex), sideInputNode.getId()); - payload.addSideInputs(sideInputNode.getId()); - sideInputIndex++; +for (SideInputReference sideInput : getSideInputs()) { + // Side inputs of the ExecutableStage itself can be uniquely identified by inner PTransform + // name and local name. + String outerLocalName = String.format("%s:%s", sideInput.transform(), sideInput.localName()); + pt.putInputs(outerLocalName, sideInput.collection().getId()); + payload.addSideInputs( + SideInputId.newBuilder() + .setTransformId(sideInput.transform().getId()) + .setLocalName(sideInput.localName()) + .build()); } int
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=91407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91407 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 16/Apr/18 17:18 Start Date: 16/Apr/18 17:18 Worklog Time Spent: 10m Work Description: bsidhom commented on issue #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#issuecomment-381681942 Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 91407) Time Spent: 2h 10m (was: 2h) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=91043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91043 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 14/Apr/18 01:30 Start Date: 14/Apr/18 01:30 Worklog Time Spent: 10m Work Description: bsidhom commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181537077 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SideInputReference.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.graph; + +import com.google.auto.value.AutoValue; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; +import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; + +/** + * A reference to a side input. This includes the PTransform that references the side input as well + * as the PCollection referenced. Both are necessary in order to fully resolve a view. + */ +@AutoValue +public abstract class SideInputReference { + + /** Create a side input reference. */ + public static SideInputReference of(PTransformNode transform, String localName, Review comment: Changed. Let me know if this is any better. I'm sad that there's no Beam auto-formatter. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 91043) Time Spent: 2h (was: 1h 50m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=91042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91042 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 14/Apr/18 01:30 Start Date: 14/Apr/18 01:30 Worklog Time Spent: 10m Work Description: bsidhom commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181537025 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -217,12 +217,12 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // Side Input PCollection ids. Each must be present as a value in the inputs of - // any PTransform the ExecutableStagePayload is the payload of. - repeated string side_inputs = 3; + // The side inputs required for this executable stage. Each must be present as a side input of + // exactly one PTransform within this ExecutableStagePayload. Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 91042) Time Spent: 1h 50m (was: 1h 40m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90994 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 13/Apr/18 22:28 Start Date: 13/Apr/18 22:28 Worklog Time Spent: 10m Work Description: tgroh commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181523237 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SideInputReference.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.graph; + +import com.google.auto.value.AutoValue; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; +import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; + +/** + * A reference to a side input. This includes the PTransform that references the side input as well + * as the PCollection referenced. Both are necessary in order to fully resolve a view. + */ +@AutoValue +public abstract class SideInputReference { + + /** Create a side input reference. */ + public static SideInputReference of(PTransformNode transform, String localName, Review comment: Formatting? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90994) Time Spent: 1h 40m (was: 1.5h) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90993 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 13/Apr/18 22:28 Start Date: 13/Apr/18 22:28 Worklog Time Spent: 10m Work Description: tgroh commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181523199 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -217,12 +217,12 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // Side Input PCollection ids. Each must be present as a value in the inputs of - // any PTransform the ExecutableStagePayload is the payload of. - repeated string side_inputs = 3; + // The side inputs required for this executable stage. Each must be present as a side input of + // exactly one PTransform within this ExecutableStagePayload. Review comment: I would modify this spec a little - "Each Side Input of each PTransform within this ExecutableStagePayload must be represented within this field." or thereabouts. That way it represents the minimum contents instead of the maximum contents; if we make more side inputs available than required, I don't expect either harness to break, just to be very confused. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90993) Time Spent: 1.5h (was: 1h 20m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90943 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 13/Apr/18 19:00 Start Date: 13/Apr/18 19:00 Worklog Time Spent: 10m Work Description: bsidhom commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181474184 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java ## @@ -122,11 +123,16 @@ default PTransform toPTransform() { pt.putInputs("input", getInputPCollection().getId()); payload.setInput(input.getId()); -int sideInputIndex = 0; -for (PCollectionNode sideInputNode : getSideInputPCollections()) { - pt.putInputs(String.format("side_input_%s", sideInputIndex), sideInputNode.getId()); - payload.addSideInputs(sideInputNode.getId()); - sideInputIndex++; +for (SideInputReference sideInput : getSideInputs()) { + // Side inputs of the ExecutableStage itself can be uniquely identified by inner PTransform + // name and local name. + String outerLocalName = String.format("%s:%s", + sideInput.transformId(), sideInput.localName()); + pt.putInputs(outerLocalName, sideInput.getCollection().getId()); + payload.addSideInputs(SideInputId.newBuilder() Review comment: Let me know if this is better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90943) Time Spent: 1h (was: 50m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90945 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 13/Apr/18 19:00 Start Date: 13/Apr/18 19:00 Worklog Time Spent: 10m Work Description: bsidhom commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181473382 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -217,12 +217,12 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // Side Input PCollection ids. Each must be present as a value in the inputs of - // any PTransform the ExecutableStagePayload is the payload of. - repeated string side_inputs = 3; + // The side inputs required for this executable stage. Each must be prsent as a side input of Review comment: Aren't you familiar with that common abbreviation??? It shaves off a whole character! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90945) Time Spent: 1h 20m (was: 1h 10m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90944 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 13/Apr/18 19:00 Start Date: 13/Apr/18 19:00 Worklog Time Spent: 10m Work Description: bsidhom commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181474800 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SideInputReference.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.graph; + +import com.google.auto.value.AutoValue; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; + +/** + * A reference to a side input. This includes the PTransform that references the side input as well + * as the PCollection referenced. Both are necessary in order to fully resolve a view. + */ +@AutoValue +public abstract class SideInputReference { + + /** Create a side input reference. */ + public static SideInputReference of(String transformId, String localName, Review comment: Yes, it's available. We already require components everywhere due to PCollectionNode. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90944) Time Spent: 1h 10m (was: 1h) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90626 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 12/Apr/18 22:58 Start Date: 12/Apr/18 22:58 Worklog Time Spent: 10m Work Description: tgroh commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181237746 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -217,12 +217,12 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // Side Input PCollection ids. Each must be present as a value in the inputs of - // any PTransform the ExecutableStagePayload is the payload of. - repeated string side_inputs = 3; + // The side inputs required for this executable stage. Each must be prsent as a side input of Review comment: spelling This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90626) Time Spent: 0.5h (was: 20m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90627 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 12/Apr/18 22:58 Start Date: 12/Apr/18 22:58 Worklog Time Spent: 10m Work Description: tgroh commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181238037 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SideInputReference.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.graph; + +import com.google.auto.value.AutoValue; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; + +/** + * A reference to a side input. This includes the PTransform that references the side input as well + * as the PCollection referenced. Both are necessary in order to fully resolve a view. + */ +@AutoValue +public abstract class SideInputReference { + + /** Create a side input reference. */ + public static SideInputReference of(String transformId, String localName, Review comment: Maybe a PTransformNode? Would that be available everywhere we're constructing this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90627) Time Spent: 40m (was: 0.5h) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90628 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 12/Apr/18 22:58 Start Date: 12/Apr/18 22:58 Worklog Time Spent: 10m Work Description: tgroh commented on a change in pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#discussion_r181237867 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java ## @@ -122,11 +123,16 @@ default PTransform toPTransform() { pt.putInputs("input", getInputPCollection().getId()); payload.setInput(input.getId()); -int sideInputIndex = 0; -for (PCollectionNode sideInputNode : getSideInputPCollections()) { - pt.putInputs(String.format("side_input_%s", sideInputIndex), sideInputNode.getId()); - payload.addSideInputs(sideInputNode.getId()); - sideInputIndex++; +for (SideInputReference sideInput : getSideInputs()) { + // Side inputs of the ExecutableStage itself can be uniquely identified by inner PTransform + // name and local name. + String outerLocalName = String.format("%s:%s", + sideInput.transformId(), sideInput.localName()); + pt.putInputs(outerLocalName, sideInput.getCollection().getId()); + payload.addSideInputs(SideInputId.newBuilder() Review comment: Your formatting looks funky here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90628) Time Spent: 50m (was: 40m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90607 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 12/Apr/18 22:12 Start Date: 12/Apr/18 22:12 Worklog Time Spent: 10m Work Description: bsidhom commented on issue #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118#issuecomment-380960266 R: @tgroh This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90607) Time Spent: 20m (was: 10m) > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name
[ https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90606 ] ASF GitHub Bot logged work on BEAM-4056: Author: ASF GitHub Bot Created on: 12/Apr/18 22:12 Start Date: 12/Apr/18 22:12 Worklog Time Spent: 10m Work Description: bsidhom opened a new pull request #5118: [BEAM-4056] Identify side inputs by transform id and local name URL: https://github.com/apache/beam/pull/5118 This is necessary to identify side inputs during portable pipeline translation and execution. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand: - [ ] What the pull request does - [ ] Why it does it - [ ] How it does it - [ ] Why this approach - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 90606) Time Spent: 10m Remaining Estimate: 0h > Identify Side Inputs by PTransform ID and local name > > > Key: BEAM-4056 > URL: https://issues.apache.org/jira/browse/BEAM-4056 > Project: Beam > Issue Type: New Feature > Components: runner-core >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > This is necessary in order to correctly identify side inputs during all > phases of portable pipeline execution (fusion, translation, and SDK > execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)