[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=121943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121943 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 11/Jul/18 16:37 Start Date: 11/Jul/18 16:37 Worklog Time Spent: 10m Work Description: lukecwik closed pull request #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.gitignore b/.gitignore index b37254a1f5e..204f22fde87 100644 --- a/.gitignore +++ b/.gitignore @@ -10,7 +10,7 @@ **/.gogradle/**/* **/gogradle.lock **/build/**/* -**/vendor/**/* +sdks/go/**/vendor/**/* **/.gradletasknamecache # Ignore files generated by the Maven build process. diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto b/model/pipeline/src/main/proto/beam_runner_api.proto index 16cb4e75dbb..3cec68d5842 100644 --- a/model/pipeline/src/main/proto/beam_runner_api.proto +++ b/model/pipeline/src/main/proto/beam_runner_api.proto @@ -339,7 +339,7 @@ message ExecutableStagePayload { // PTransform the ExecutableStagePayload is the payload of. string input = 2; - // The side inputs required for this executable stage. Each Side Input of each PTransform within + // The side inputs required for this executable stage. Each side input of each PTransform within // this ExecutableStagePayload must be represented within this field. repeated SideInputId side_inputs = 3; @@ -355,6 +355,10 @@ message ExecutableStagePayload { // in transforms, and the closure of all of the components they recognize. Components components = 6; + // The user states required for this executable stage. Each user state of each PTransform within + // this ExecutableStagePayload must be represented within this field. + repeated UserStateId user_states = 7; + // A reference to a side input. Side inputs are uniquely identified by PTransform id and // local name. message SideInputId { @@ -364,6 +368,16 @@ message ExecutableStagePayload { // (Required) The local name of this side input from the PTransform that references it. string local_name = 2; } + + // A reference to user state. User states are uniquely identified by PTransform id and + // local name. + message UserStateId { +// (Required) The id of the PTransform that references this user state. +string transform_id = 1; + +// (Required) The local name of this user state for the PTransform that references it. +string local_name = 2; + } } // The payload for the primitive ParDo transform. diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java index c08b841fd86..486b3b7a98a 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java @@ -26,6 +26,7 @@ import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId; +import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.UserStateId; import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec; import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection; import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform; @@ -80,6 +81,9 @@ */ Collection getSideInputs(); + /** Returns the set of {@link PTransformNode PTransforms} that contain user state. */ + Collection getUserStates(); + /** * Returns the leaf {@link PCollectionNode PCollections} of this {@link ExecutableStage}. * @@ -132,8 +136,14 @@ default PTransform toPTransform(String uniqueName) { payload.addSideInputs( SideInputId.newBuilder() .setTransformId(sideInput.transform().getId()) - .setLocalName(sideInput.localName()) - .build()); + .setLocalName(sideInput.localName())); +} + +for (UserStateReference userState : getUserStates()) { + payload.addUserStates( + UserStateId.newBuilder() + .setTransformId(userState.transform().getId()) + .setLocalName(userState.localName())); }
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=121933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121933 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 11/Jul/18 15:58 Start Date: 11/Jul/18 15:58 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-404222099 Run Java Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 121933) Time Spent: 3.5h (was: 3h 20m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=121922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121922 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 11/Jul/18 15:03 Start Date: 11/Jul/18 15:03 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-404202558 Run Java Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 121922) Time Spent: 3h 20m (was: 3h 10m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=116870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116870 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 28/Jun/18 16:24 Start Date: 28/Jun/18 16:24 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #5445: [BEAM-2915, BEAM-4484] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-401093184 We have turned on autoformatting of the codebase, which causes small conflicts across the board. You can probably safely rebase and just keep your changes. Like this: ``` $ git rebase ... see some conflicts $ git diff ... confirmed that the conflicts are just autoformatting ... so we can just keep our changes are do our own autoformat $ git checkout --theirs -- $ git add -u $ git rebase --continue $ ./gradlew spotlessJavaApply ``` Please ping me if you run into any difficulty. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 116870) Time Spent: 3h 10m (was: 3h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=110004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110004 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 08/Jun/18 05:18 Start Date: 08/Jun/18 05:18 Worklog Time Spent: 10m Work Description: tweise commented on issue #5445: [BEAM-2915, BEAM-4844] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-395650277 test this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110004) Time Spent: 3h (was: 2h 50m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109978 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 08/Jun/18 02:50 Start Date: 08/Jun/18 02:50 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915, BEAM-4844] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-395629785 It could be flaky, cut bugs for failing tests and assign to authors. Disable the test if it is consistently failing and update the bug saying that it has been disabled. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109978) Time Spent: 2h 50m (was: 2h 40m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109973 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 08/Jun/18 02:31 Start Date: 08/Jun/18 02:31 Worklog Time Spent: 10m Work Description: tweise commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-395626847 Not clear why some PR pre-commit builds pass and others not. I wasn't able to build master locally. Flaky build? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109973) Time Spent: 2h 40m (was: 2.5h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109971 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 08/Jun/18 02:20 Start Date: 08/Jun/18 02:20 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#discussion_r19393 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java ## @@ -181,6 +181,10 @@ private static boolean parDoCompatibility( // side inputs can be fused with other transforms in the same environment which are not // upstream of any of the side inputs. return pipeline.getSideInputs(parDo).isEmpty() +// Since we lack the ability to mark upstream transforms as key preserving, we +// purposefully break fusion here to provide runners the opportunity to insert a +// grouping operation +&& pipeline.getUserStates(parDo).isEmpty() Review comment: Runners need to look at the state spec and see if it exists, that would signal insertion of anything which does a partition by key (e.g. a reshuffle). Most runners (maybe all) only support user state via key partitioning. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109971) Time Spent: 2.5h (was: 2h 20m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109928 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 07/Jun/18 23:44 Start Date: 07/Jun/18 23:44 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-395600081 It turns out that hcatalog tests always fail and are already set to be ignored and the elastic search io tests are flaky. The DirectRunner issue is pre-existing and I'm not making the problem worse here. Kicked off another test run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109928) Time Spent: 2h 20m (was: 2h 10m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109175 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 19:07 Start Date: 05/Jun/18 19:07 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394825200 I thought I already did rebase to fix the `Components` issue. I can't merge because the tests will fail consistently until I solve/work around BEAM-4484 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109175) Time Spent: 2h 10m (was: 2h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109152 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 18:11 Start Date: 05/Jun/18 18:11 Worklog Time Spent: 10m Work Description: tweise commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394807534 Thanks, can you rebase this PR to fix the compile error? Since the shading issue is pre-existing in master, I think we can merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109152) Time Spent: 2h (was: 1h 50m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109149 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 18:02 Start Date: 05/Jun/18 18:02 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394804785 Filed BEAM-4484 for the shading issue. Will investigate to see if there is a way to keep shading beam-model-pipeline in the direct runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109149) Time Spent: 1h 50m (was: 1h 40m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109148 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 17:50 Start Date: 05/Jun/18 17:50 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394801030 I looked at the post shaded beam-model-pipeline proto filedescriptors and it turns out that during the shading process, an over eager string replacement inside a class is corrupting an internal field, in this case it modifies the RunnerApi file descriptor storing `org.apache.beam.model.pipeline.v1.AccumulationMode.Enum` and changing it to `org.apache.beam.repackaged.beam_runners_direct_java.model.pipeline.v1.AccumulationMode.Enum` This problem exists because the proto package name and the java package name collide and use `org.apache.beam.model.pipeline.v1` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109148) Time Spent: 1h 40m (was: 1.5h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109136 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 17:19 Start Date: 05/Jun/18 17:19 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394790606 I have been working to figure out what it is that is breaking this and it seems to be related to shading beam-model-pipeline with the direct runner. Somehow the proto is considered corrupted if protobuf java is added to the classpath. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109136) Time Spent: 1.5h (was: 1h 20m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109135 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 17:17 Start Date: 05/Jun/18 17:17 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394790606 I have been working to figure out what it is that is breaking this and it seems to be related to shading beam-model-pipeline with the direct runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109135) Time Spent: 1h 20m (was: 1h 10m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=109127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109127 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 17:00 Start Date: 05/Jun/18 17:00 Worklog Time Spent: 10m Work Description: tweise commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394785023 I tried to build locally and had to make the following change: ``` diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java index ca93d5ecc2..67b2f735c6 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java @@ -45,6 +45,7 @@ import java.util.concurrent.ThreadFactory; import org.apache.beam.fn.harness.FnHarness; import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target; import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.Components; import org.apache.beam.runners.core.construction.PipelineTranslation; import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.core.construction.graph.FusedPipeline; ``` Unfortunately there seem to be also unrelated test failures. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109127) Time Spent: 1h 10m (was: 1h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=108916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108916 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 05/Jun/18 01:40 Start Date: 05/Jun/18 01:40 Worklog Time Spent: 10m Work Description: tweise commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-394553108 test this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108916) Time Spent: 1h (was: 50m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=108718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108718 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 04/Jun/18 20:06 Start Date: 04/Jun/18 20:06 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#discussion_r192864202 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java ## @@ -214,17 +258,22 @@ public static StateRequestHandler forMultimapSideInputHandlerFactory( TypeCase.MULTIMAP_SIDE_INPUT); StateKey.MultimapSideInput stateKey = request.getStateKey().getMultimapSideInput(); -MultimapSideInputSpec sideInputReferenceSpec = +MultimapSideInputSpec referenceSpec = processBundleDescriptor.getMultimapSideInputSpecs() .get(stateKey.getPtransformId()) .get(stateKey.getSideInputId()); -MultimapSideInputHandler handler = cache.computeIfAbsent( -sideInputReferenceSpec, +MultimapSideInputHandler handler = cache.computeIfAbsent( +referenceSpec, this::createHandler); +// TODO: Migrate to using the ByteStringCoder everywhere for user types and values +// to reduce the amount of byte array copying. +Object key = referenceSpec.keyCoder().decode(stateKey.getKey().newInput()); Review comment: No, all types are supported. We are hiding the fact that we are using the ByteStringCoder internally (hardcoded in ProcessBundleDescriptors#forBagUserState) as an optimization. This allows us to pass an Object and a Coder without the Runner needing to care how to turn that into bytes giving us the freedom to change which types we pass around without the Runner being the wiser. The TODO is about changing the interface to be tailored towards using ByteString directly on the interface and would be a minor performance improvement for some runners. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108718) Time Spent: 50m (was: 40m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=107983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107983 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 01/Jun/18 05:19 Start Date: 01/Jun/18 05:19 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#discussion_r192299132 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java ## @@ -214,17 +258,22 @@ public static StateRequestHandler forMultimapSideInputHandlerFactory( TypeCase.MULTIMAP_SIDE_INPUT); StateKey.MultimapSideInput stateKey = request.getStateKey().getMultimapSideInput(); -MultimapSideInputSpec sideInputReferenceSpec = +MultimapSideInputSpec referenceSpec = processBundleDescriptor.getMultimapSideInputSpecs() .get(stateKey.getPtransformId()) .get(stateKey.getSideInputId()); -MultimapSideInputHandler handler = cache.computeIfAbsent( -sideInputReferenceSpec, +MultimapSideInputHandler handler = cache.computeIfAbsent( +referenceSpec, this::createHandler); +// TODO: Migrate to using the ByteStringCoder everywhere for user types and values +// to reduce the amount of byte array copying. +Object key = referenceSpec.keyCoder().decode(stateKey.getKey().newInput()); Review comment: Does this mean only key types with built-in coders are supported? Why do we need to decode the key here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107983) Time Spent: 40m (was: 0.5h) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=107980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107980 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 01/Jun/18 04:54 Start Date: 01/Jun/18 04:54 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#discussion_r192297971 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java ## @@ -181,6 +181,10 @@ private static boolean parDoCompatibility( // side inputs can be fused with other transforms in the same environment which are not // upstream of any of the side inputs. return pipeline.getSideInputs(parDo).isEmpty() +// Since we lack the ability to mark upstream transforms as key preserving, we +// purposefully break fusion here to provide runners the opportunity to insert a +// grouping operation +&& pipeline.getUserStates(parDo).isEmpty() Review comment: Based on what information would a runner insert a shuffle though? Unless that is available, there is also no benefit breaking the chain here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107980) Time Spent: 0.5h (was: 20m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=105957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105957 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 25/May/18 17:13 Start Date: 25/May/18 17:13 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-392123227 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105957) Time Spent: 20m (was: 10m) > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2915) Java SDK support for portable user state
[ https://issues.apache.org/jira/browse/BEAM-2915?focusedWorklogId=105776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105776 ] ASF GitHub Bot logged work on BEAM-2915: Author: ASF GitHub Bot Created on: 24/May/18 23:24 Start Date: 24/May/18 23:24 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #5445: [BEAM-2915] Add support for handling bag user state to the java-fn-execution library to support runner integration. URL: https://github.com/apache/beam/pull/5445#issuecomment-391895687 I was able to fix a few minor issues with the payload representation and tests. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105776) Time Spent: 10m Remaining Estimate: 0h > Java SDK support for portable user state > > > Key: BEAM-2915 > URL: https://issues.apache.org/jira/browse/BEAM-2915 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)