[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172650 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 14:17 Start Date: 06/Dec/18 14:17 Worklog Time Spent: 10m Work Description: mxm closed pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java index 09d1a991c279..4fe4a0f3669e 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java @@ -19,7 +19,15 @@ import static com.google.common.base.Preconditions.checkArgument; +import com.google.common.base.Joiner; +import com.google.common.collect.Iterables; +import com.google.common.collect.LinkedHashMultimap; +import com.google.common.collect.Multimap; import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Map; +import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload; import org.apache.beam.runners.core.construction.graph.ExecutableStage; @@ -45,19 +53,80 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + /** + * Creates a human-readable name for a set of stage names that occur in a single stage. + * + * This name reflects the nested structure of the stages, as inferred by slashes in the stage + * names. Sibling stages will be listed as {A, B}, nested stages as A/B, and according to the + * value of truncateSiblingComposites the nesting stops at the first level that siblings are + * encountered. + * + * This is best understood via examples, of which there are several in the tests for this + * class. + * + * @param names a list of full stage names in this fused operation + * @param truncateSiblingComposites whether to recursively descent into composite operations that + * have simblings, or stop the recursion at that level. + * @return a single string representation of all the stages in this fused operation + */ + public static String generateNameFromTransformNames( + Collection names, boolean truncateSiblingComposites) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); + } else { +// Everything is in the same outer stage, enumerate at one level down. +return String.format( +"%s/%s", +outer.getKey(), +generateNameFromTransformNames(outer.getValue(), truncateSiblingComposites)); + } +} else { + Collection part
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172643 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 13:40 Start Date: 06/Dec/18 13:40 Worklog Time Spent: 10m Work Description: robertwb commented on issue #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#issuecomment-444873795 Ah, looks like I can't actually link due to the main files not depending on tests... Yes, we want to recurse because there are times when a single outer composite spans many stages (and otherwise they would all have the same name). I suppose another way to explain the algorithm is that we recurse until we have multiple children. Squashed and pushed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172643) Time Spent: 3h 50m (was: 3h 40m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172638 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 13:20 Start Date: 06/Dec/18 13:20 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239446348 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); Review comment: Right. I made this comment at first look when I hadn't seen the recursive nature yet. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172638) Time Spent: 3h 40m (was: 3.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172637 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 13:20 Start Date: 06/Dec/18 13:20 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239447810 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java ## @@ -78,6 +81,40 @@ public void processElement( String executableStageName = ExecutableStageTranslation.generateNameFromStagePayload(basePayload); -assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, count}")); +assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, Composite}")); + } + + @Test + public void testOperatorNameGenerationFromNames() { +assertGeneratedNames("A", "A", Arrays.asList("A")); +assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1")); +assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", "A/a2")); +assertGeneratedNames( +"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", "A/a2/a2.1", "A/a2/a2.2")); +assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", "A/a2", "B")); +assertGeneratedNames( +"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", "B", "C/c/cc")); Review comment: I think that could be unexpected to users because they could assume it either always or never recursively lists the composite transforms. On the other hand, it's good to get a bit more insight for single composite transforms. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172637) Time Spent: 3h 40m (was: 3.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172605 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 09:34 Start Date: 06/Dec/18 09:34 Worklog Time Spent: 10m Work Description: robertwb commented on issue #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#issuecomment-444808403 The last example is a real-world example of what happens with and without shortening. (Actually, the Write is even worse.) If there are many siblings, I think it's a better use of real estate to list them all, than list the full nesting of the first one. (Aside: I originally had everything in a single expression that had a pair of ternary operations...but that turned out to be truly inscrutable.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172605) Time Spent: 3.5h (was: 3h 20m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3.5h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172600 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 09:28 Start Date: 06/Dec/18 09:28 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239379478 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); Review comment: This is all we have to go on here. It's up to the SDK to choose how to follow this convention. If an extra slash is present, it'll just look more nested than it really is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172600) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172601 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 09:28 Start Date: 06/Dec/18 09:28 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239375960 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { Review comment: I renamed this to truncateSiblingComposites and added a javadoc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172601) Time Spent: 3h 20m (was: 3h 10m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172598 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 09:28 Start Date: 06/Dec/18 09:28 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239379684 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); + } else { +// Everything is in the same outer stage, enumerate at one level down. +return String.format( +"%s/%s", outer.getKey(), generateNameFromTransformNames(outer.getValue(), truncate)); Review comment: We descent even if we had a single (non-empty) element. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172598) Time Spent: 3h 10m (was: 3h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172599 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 09:28 Start Date: 06/Dec/18 09:28 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239379773 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java ## @@ -78,6 +81,40 @@ public void processElement( String executableStageName = ExecutableStageTranslation.generateNameFromStagePayload(basePayload); -assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, count}")); +assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, Composite}")); + } + + @Test + public void testOperatorNameGenerationFromNames() { +assertGeneratedNames("A", "A", Arrays.asList("A")); +assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1")); +assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", "A/a2")); +assertGeneratedNames( +"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", "A/a2/a2.1", "A/a2/a2.2")); +assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", "A/a2", "B")); +assertGeneratedNames( +"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", "B", "C/c/cc")); Review comment: Yes, exactly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172599) Time Spent: 3h 10m (was: 3h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172349 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239129246 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java ## @@ -78,6 +81,40 @@ public void processElement( String executableStageName = ExecutableStageTranslation.generateNameFromStagePayload(basePayload); -assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, count}")); +assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, Composite}")); + } + + @Test + public void testOperatorNameGenerationFromNames() { +assertGeneratedNames("A", "A", Arrays.asList("A")); +assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1")); +assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", "A/a2")); +assertGeneratedNames( +"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", "A/a2/a2.1", "A/a2/a2.2")); +assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", "A/a2", "B")); +assertGeneratedNames( +"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", "B", "C/c/cc")); Review comment: The logic is that we only recurse into the composites if we have only one of it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172349) Time Spent: 2.5h (was: 2h 20m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172354 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239122914 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java ## @@ -78,6 +81,40 @@ public void processElement( String executableStageName = ExecutableStageTranslation.generateNameFromStagePayload(basePayload); -assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, count}")); +assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, Composite}")); + } + + @Test + public void testOperatorNameGenerationFromNames() { +assertGeneratedNames("A", "A", Arrays.asList("A")); +assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1")); +assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", "A/a2")); +assertGeneratedNames( +"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", "A/a2/a2.1", "A/a2/a2.2")); +assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", "A/a2", "B")); +assertGeneratedNames( +"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", "B", "C/c/cc")); Review comment: It seems inconsistent that we don't get `A{a1/a2}, B, C` here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172354) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172353 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239130429 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); + } else { +// Everything is in the same outer stage, enumerate at one level down. +return String.format( +"%s/%s", outer.getKey(), generateNameFromTransformNames(outer.getValue(), truncate)); + } +} else { + Collection parts; + if (truncate) { +// Enumerate the outer stages without their composite structure, if any. +parts = groupByOuter.keySet(); Review comment: Note to self: Recursion anchor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172353) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172350 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239105166 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { Review comment: Would it make sense to add a short JavaDoc comment, explaining what kind of modes this method offers for generating a name from the transforms? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172350) Time Spent: 2h 40m (was: 2.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172351 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239130260 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); Review comment: Note to self: Recursion anchor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172351) Time Spent: 2h 40m (was: 2.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172348 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239114817 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { Review comment: Would `collapseComposites` be a better name than `truncate`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172348) Time Spent: 2.5h (was: 2h 20m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172352 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239104348 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); Review comment: I suppose names can never contain slashes or they are escaped? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172352) Time Spent: 2h 50m (was: 2h 40m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172355 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 16:10 Start Date: 05/Dec/18 16:10 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208#discussion_r239128952 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java ## @@ -45,19 +53,60 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + public static String generateNameFromTransformNames(Collection names, boolean truncate) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); + } else { +// Everything is in the same outer stage, enumerate at one level down. +return String.format( +"%s/%s", outer.getKey(), generateNameFromTransformNames(outer.getValue(), truncate)); Review comment: We go down a level here but only if we had multiple elements inside the same composite transform and no other composite transforms. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172355) Time Spent: 3h (was: 2h 50m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 3h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172279 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 05/Dec/18 13:53 Start Date: 05/Dec/18 13:53 Worklog Time Spent: 10m Work Description: robertwb opened a new pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208 I decided to default to truncated names because they both look nicer and they have more information if the UI is going to (implicitly or explicitly) truncate them on length anyways. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172279) Time Spent: 2h 20m (was: 2h 10m) > Improve Traceability o
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171679 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 19:10 Start Date: 03/Dec/18 19:10 Worklog Time Spent: 10m Work Description: tweise closed pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java index c9bb4f6ea030..09d1a991c279 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java @@ -39,4 +39,25 @@ public static ExecutableStagePayload getExecutableStagePayload( checkArgument(ExecutableStage.URN.equals(transform.getSpec().getUrn())); return ExecutableStagePayload.parseFrom(transform.getSpec().getPayload()); } + + public static String generateNameFromStagePayload(ExecutableStagePayload stagePayload) { +StringBuilder sb = new StringBuilder(); +RunnerApi.Components components = stagePayload.getComponents(); +final int transformsCount = stagePayload.getTransformsCount(); +sb.append("[").append(transformsCount).append("]"); +sb.append("{"); +for (int i = 0; i < transformsCount; i++) { + String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); + // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer + name = name.replaceFirst("^ref_AppliedPTransform_", ""); + // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer + name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); + sb.append(name); + if (i + 1 < transformsCount) { +sb.append(", "); + } +} +sb.append("}"); +return sb.toString(); + } } diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java new file mode 100644 index ..c5b06f76db04 --- /dev/null +++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import static org.hamcrest.core.Is.is; +import static org.junit.Assert.assertThat; + +import java.io.Serializable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Impulse; +import org.apache.beam.sdk.transforms.ParDo; +import org.junit.Test; + +/** Tests for {@link ExecutableStageTranslation}. */ +public class ExecutableStageTranslationTest implements Serializable { + + @Test + /* Test for generating readable operator names during translation. */ + public void testOperatorNameGeneration() throws Exception { +Pipeline p = Pipeline.create(); +p.apply(Impulse.create()) +// Anonymous ParDo +.apply( +ParDo.of( +new DoFn() { + @ProcessElement + public
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171619 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 15:40 Start Date: 03/Dec/18 15:40 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238318772 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java ## @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy( e); } } + + public static String genOperatorNameFromStagePayload( Review comment: Perhaps `ExecutableStageTranslation` would be a good place to put this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171619) Time Spent: 2h (was: 1h 50m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 2h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171618 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 15:38 Start Date: 03/Dec/18 15:38 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238318117 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java ## @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy( e); } } + + public static String genOperatorNameFromStagePayload( + RunnerApi.ExecutableStagePayload stagePayload) { +StringBuilder sb = new StringBuilder(); +final int transformsCount = stagePayload.getTransformsCount(); +sb.append("[").append(transformsCount).append("]"); +sb.append("{"); +for (int i = 0; i < transformsCount; i++) { + String name = stagePayload.getTransforms(i); + // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer Review comment: Good suggestion. The unique name doesn't include the id but makes it more readable: ``` // Master beam:env:docker:v1:1 // Old PR version [5]{read/Read/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_14, read/Read/Reshuffle/RemoveRandomKeys_15, read/Read/ReadSplits_16, split_17, pair_with_one_18} // Using unique_name [5]{read/Read/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps), read/Read/Reshuffle/RemoveRandomKeys, read/Read/ReadSplits, split, pair_with_one} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171618) Time Spent: 1h 50m (was: 1h 40m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171552 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 12:42 Start Date: 03/Dec/18 12:42 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238253133 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java ## @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy( e); } } + + public static String genOperatorNameFromStagePayload( + RunnerApi.ExecutableStagePayload stagePayload) { +StringBuilder sb = new StringBuilder(); +final int transformsCount = stagePayload.getTransformsCount(); +sb.append("[").append(transformsCount).append("]"); +sb.append("{"); +for (int i = 0; i < transformsCount; i++) { + String name = stagePayload.getTransforms(i); + // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer Review comment: It would probably be better to use stagePayload.components.transforms[name].unique_name rather than its id here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171552) Time Spent: 1h 40m (was: 1.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171551 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 12:42 Start Date: 03/Dec/18 12:42 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238252456 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java ## @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy( e); } } + + public static String genOperatorNameFromStagePayload( Review comment: This isn't really Flink-specific. I wonder if it'd be better placed somewhere in core-construction. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171551) Time Spent: 1.5h (was: 1h 20m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171533 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 10:59 Start Date: 03/Dec/18 10:59 Worklog Time Spent: 10m Work Description: mxm commented on issue #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#issuecomment-443670703 >New naming seems more intuitive albeit slightly increased real estate: Besides the increased screen real estate, they actually list more information about the stages in the harness. Those were hidden before. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171533) Time Spent: 1h 10m (was: 1h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171536 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 11:17 Start Date: 03/Dec/18 11:17 Worklog Time Spent: 10m Work Description: mxm commented on issue #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#issuecomment-443675464 We can make the Python pipelines even more readable by removing `ref_AppliedPTransform_` prefix. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171536) Time Spent: 1h 20m (was: 1h 10m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171530 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 10:56 Start Date: 03/Dec/18 10:56 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238223008 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/translation/FlinkPipelineTranslatorUtilsTest.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.flink.translation; + +import static org.hamcrest.core.Is.is; +import static org.junit.Assert.assertThat; + +import java.io.Serializable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.PipelineTranslation; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; +import org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Impulse; +import org.apache.beam.sdk.transforms.ParDo; +import org.junit.Test; + +/** + * Tests for {@link org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils}. + */ +public class FlinkPipelineTranslatorUtilsTest implements Serializable { + + @Test + /* Test for generating readable operator names during translation. */ + public void testOperatorNameGeneration() throws Exception { +Pipeline p = Pipeline.create(); +p.apply(Impulse.create()) +// Anonymous ParDo +.apply( +ParDo.of( +new DoFn() { + @ProcessElement + public void processElement( + ProcessContext processContext, OutputReceiver outputReceiver) {} +})) +// Name ParDo +.apply( +"MyName", +ParDo.of( +new DoFn() { + @ProcessElement + public void processElement( + ProcessContext processContext, OutputReceiver outputReceiver) {} +})); + +ExecutableStage firstEnvStage = +GreedyPipelineFuser.fuse(PipelineTranslation.toProto(p)) +.getFusedStages() +.stream() +.findFirst() +.get(); +RunnerApi.ExecutableStagePayload basePayload = +RunnerApi.ExecutableStagePayload.parseFrom( +firstEnvStage.toPTransform("foo").getSpec().getPayload()); + +String executableStageName = + FlinkPipelineTranslatorUtils.genOperatorNameFromStagePayload(basePayload); + +assertThat(executableStageName, is("ExecutableStage(2)[ParDo(Anonymous)][MyName]")); Review comment: I think we could skip the word `ExecutableStage` entirely. Maybe just enclosing it in `{}` would suffice to indicate that a harness is involved. I realized the names are less readable for Python. In Java we would just get the user-defined name (if set). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171530) Time Spent: 1h (was: 50m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Maj
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171456 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 04:18 Start Date: 03/Dec/18 04:18 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#discussion_r238141279 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/translation/FlinkPipelineTranslatorUtilsTest.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.flink.translation; + +import static org.hamcrest.core.Is.is; +import static org.junit.Assert.assertThat; + +import java.io.Serializable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.runners.core.construction.PipelineTranslation; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser; +import org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Impulse; +import org.apache.beam.sdk.transforms.ParDo; +import org.junit.Test; + +/** + * Tests for {@link org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils}. + */ +public class FlinkPipelineTranslatorUtilsTest implements Serializable { + + @Test + /* Test for generating readable operator names during translation. */ + public void testOperatorNameGeneration() throws Exception { +Pipeline p = Pipeline.create(); +p.apply(Impulse.create()) +// Anonymous ParDo +.apply( +ParDo.of( +new DoFn() { + @ProcessElement + public void processElement( + ProcessContext processContext, OutputReceiver outputReceiver) {} +})) +// Name ParDo +.apply( +"MyName", +ParDo.of( +new DoFn() { + @ProcessElement + public void processElement( + ProcessContext processContext, OutputReceiver outputReceiver) {} +})); + +ExecutableStage firstEnvStage = +GreedyPipelineFuser.fuse(PipelineTranslation.toProto(p)) +.getFusedStages() +.stream() +.findFirst() +.get(); +RunnerApi.ExecutableStagePayload basePayload = +RunnerApi.ExecutableStagePayload.parseFrom( +firstEnvStage.toPTransform("foo").getSpec().getPayload()); + +String executableStageName = + FlinkPipelineTranslatorUtils.genOperatorNameFromStagePayload(basePayload); + +assertThat(executableStageName, is("ExecutableStage(2)[ParDo(Anonymous)][MyName]")); Review comment: Can we skip or shorten `ExecutableStage`? In the metric backend system characters usually come at a premium and it does not really add anything valuable for the user. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171456) Time Spent: 50m (was: 40m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, w
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171447 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 03/Dec/18 03:48 Start Date: 03/Dec/18 03:48 Worklog Time Spent: 10m Work Description: tweise commented on issue #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#issuecomment-443580831 Here is an example before and after: ![image](https://user-images.githubusercontent.com/263695/49351978-a0d3c700-f66a-11e8-9644-8bcc5bf3d64a.png) New naming seems more intuitive albeit slightly increased real estate: ![image](https://user-images.githubusercontent.com/263695/49352025-d11b6580-f66a-11e8-9ee2-bd5f0b1395e9.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 171447) Time Spent: 40m (was: 0.5h) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 40m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170358 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 28/Nov/18 18:37 Start Date: 28/Nov/18 18:37 Worklog Time Spent: 10m Work Description: mxm commented on issue #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#issuecomment-442556971 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 170358) Time Spent: 0.5h (was: 20m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170334 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 28/Nov/18 17:32 Start Date: 28/Nov/18 17:32 Worklog Time Spent: 10m Work Description: mxm commented on issue #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150#issuecomment-442534708 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 170334) Time Spent: 20m (was: 10m) > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 20m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph without any trace of the original Beam > pipeline: > WordCount: > !wordcount.png! > TFX: > !tfx.png! > Some aspects which make understanding these graphs hard: > * Users don't know how the Runner maps Beam to Flink concepts > * The UI is awfully slow / hangs when the pipeline is reasonable complex > * The operator names seem to use {{transform.getUniqueName()}} which doesn't > generate readable name > * So called Chaining combines operators into a single operator which makes > understanding which Beam concept belongs to which Flink concept even harder > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170295 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 28/Nov/18 16:33 Start Date: 28/Nov/18 16:33 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #7150: [BEAM-5859] Improve operator names for portable pipelines URL: https://github.com/apache/beam/pull/7150 This adds a more readable operator name for executable stages. It is of the form: ExecutableStage(numTransforms)[transformName][transformName2].. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 170295) Time Spent: 10m Remaining Estimate: 0h > Improve Traceability of Pipeline translation > > > Key: BEAM-5859 > URL: https://issues.apache.org/jira/browse/BEAM-5859 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: tfx.png, wordcount.png > > Time Spent: 10m > Remaining Estimate: 0h > > Users often ask how they can reason about the pipeline translation. The Flink > UI display a confusingly large graph withou