[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172650
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 14:17
Start Date: 06/Dec/18 14:17
Worklog Time Spent: 10m 
  Work Description: mxm closed pull request #7208: [BEAM-5859] Better 
handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
index 09d1a991c279..4fe4a0f3669e 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
@@ -19,7 +19,15 @@
 
 import static com.google.common.base.Preconditions.checkArgument;
 
+import com.google.common.base.Joiner;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.LinkedHashMultimap;
+import com.google.common.collect.Multimap;
 import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Map;
+import java.util.stream.Collectors;
 import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload;
 import org.apache.beam.runners.core.construction.graph.ExecutableStage;
@@ -45,19 +53,80 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  /**
+   * Creates a human-readable name for a set of stage names that occur in a 
single stage.
+   *
+   * This name reflects the nested structure of the stages, as inferred by 
slashes in the stage
+   * names. Sibling stages will be listed as {A, B}, nested stages as A/B, and 
according to the
+   * value of truncateSiblingComposites the nesting stops at the first level 
that siblings are
+   * encountered.
+   *
+   * This is best understood via examples, of which there are several in 
the tests for this
+   * class.
+   *
+   * @param names a list of full stage names in this fused operation
+   * @param truncateSiblingComposites whether to recursively descent into 
composite operations that
+   * have simblings, or stop the recursion at that level.
+   * @return a single string representation of all the stages in this fused 
operation
+   */
+  public static String generateNameFromTransformNames(
+  Collection names, boolean truncateSiblingComposites) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
+  } else {
+// Everything is in the same outer stage, enumerate at one level down.
+return String.format(
+"%s/%s",
+outer.getKey(),
+generateNameFromTransformNames(outer.getValue(), 
truncateSiblingComposites));
+  }
+} else {
+  Collection part

[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172643
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 13:40
Start Date: 06/Dec/18 13:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #7208: [BEAM-5859] Better 
handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#issuecomment-444873795
 
 
   Ah, looks like I can't actually link due to the main files not depending on 
tests... Yes, we want to recurse because there are times when a single outer 
composite spans many stages (and otherwise they would all have the same name). 
I suppose another way to explain the algorithm is that we recurse until we have 
multiple children. 
   
   Squashed and pushed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172643)
Time Spent: 3h 50m  (was: 3h 40m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172638
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 13:20
Start Date: 06/Dec/18 13:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239446348
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
 
 Review comment:
   Right. I made this comment at first look when I hadn't seen the recursive 
nature yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172638)
Time Spent: 3h 40m  (was: 3.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172637
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 13:20
Start Date: 06/Dec/18 13:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239447810
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 ##
 @@ -78,6 +81,40 @@ public void processElement(
 String executableStageName =
 ExecutableStageTranslation.generateNameFromStagePayload(basePayload);
 
-assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
count}"));
+assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
Composite}"));
+  }
+
+  @Test
+  public void testOperatorNameGenerationFromNames() {
+assertGeneratedNames("A", "A", Arrays.asList("A"));
+assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1"));
+assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", 
"A/a2"));
+assertGeneratedNames(
+"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", 
"A/a2/a2.1", "A/a2/a2.2"));
+assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", 
"A/a2", "B"));
+assertGeneratedNames(
+"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", 
"B", "C/c/cc"));
 
 Review comment:
   I think that could be unexpected to users because they could assume it 
either always or never recursively lists the composite transforms. On the other 
hand, it's good to get a bit more insight for single composite transforms.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172637)
Time Spent: 3h 40m  (was: 3.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172605
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 09:34
Start Date: 06/Dec/18 09:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #7208: [BEAM-5859] Better 
handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#issuecomment-444808403
 
 
   The last example is a real-world example of what happens with and without 
shortening. (Actually, the Write is even worse.) If there are many siblings, I 
think it's a better use of real estate to list them all, than list the full 
nesting of the first one. 
   
   (Aside: I originally had everything in a single expression that had a pair 
of ternary operations...but that turned out to be truly inscrutable.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172605)
Time Spent: 3.5h  (was: 3h 20m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172600
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 09:28
Start Date: 06/Dec/18 09:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239379478
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
 
 Review comment:
   This is all we have to go on here. It's up to the SDK to choose how to 
follow this convention. If an extra slash is present, it'll just look more 
nested than it really is. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172600)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172601
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 09:28
Start Date: 06/Dec/18 09:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239375960
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
 
 Review comment:
   I renamed this to truncateSiblingComposites and added a javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172601)
Time Spent: 3h 20m  (was: 3h 10m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172598
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 09:28
Start Date: 06/Dec/18 09:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239379684
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
+  } else {
+// Everything is in the same outer stage, enumerate at one level down.
+return String.format(
+"%s/%s", outer.getKey(), 
generateNameFromTransformNames(outer.getValue(), truncate));
 
 Review comment:
   We descent even if we had a single (non-empty) element.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172598)
Time Spent: 3h 10m  (was: 3h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172599
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 09:28
Start Date: 06/Dec/18 09:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239379773
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 ##
 @@ -78,6 +81,40 @@ public void processElement(
 String executableStageName =
 ExecutableStageTranslation.generateNameFromStagePayload(basePayload);
 
-assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
count}"));
+assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
Composite}"));
+  }
+
+  @Test
+  public void testOperatorNameGenerationFromNames() {
+assertGeneratedNames("A", "A", Arrays.asList("A"));
+assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1"));
+assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", 
"A/a2"));
+assertGeneratedNames(
+"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", 
"A/a2/a2.1", "A/a2/a2.2"));
+assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", 
"A/a2", "B"));
+assertGeneratedNames(
+"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", 
"B", "C/c/cc"));
 
 Review comment:
   Yes, exactly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172599)
Time Spent: 3h 10m  (was: 3h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172349
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239129246
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 ##
 @@ -78,6 +81,40 @@ public void processElement(
 String executableStageName =
 ExecutableStageTranslation.generateNameFromStagePayload(basePayload);
 
-assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
count}"));
+assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
Composite}"));
+  }
+
+  @Test
+  public void testOperatorNameGenerationFromNames() {
+assertGeneratedNames("A", "A", Arrays.asList("A"));
+assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1"));
+assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", 
"A/a2"));
+assertGeneratedNames(
+"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", 
"A/a2/a2.1", "A/a2/a2.2"));
+assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", 
"A/a2", "B"));
+assertGeneratedNames(
+"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", 
"B", "C/c/cc"));
 
 Review comment:
   The logic is that we only recurse into the composites if we have only one of 
it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172349)
Time Spent: 2.5h  (was: 2h 20m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172354
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239122914
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 ##
 @@ -78,6 +81,40 @@ public void processElement(
 String executableStageName =
 ExecutableStageTranslation.generateNameFromStagePayload(basePayload);
 
-assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
count}"));
+assertThat(executableStageName, is("[3]{ParDo(Anonymous), MyName, 
Composite}"));
+  }
+
+  @Test
+  public void testOperatorNameGenerationFromNames() {
+assertGeneratedNames("A", "A", Arrays.asList("A"));
+assertGeneratedNames("A/a1", "A/a1", Arrays.asList("A/a1"));
+assertGeneratedNames("A/{a1, a2}", "A/{a1, a2}", Arrays.asList("A/a1", 
"A/a2"));
+assertGeneratedNames(
+"A/{a1, a2}", "A/{a1, a2/{a2.1, a2.2}}", Arrays.asList("A/a1", 
"A/a2/a2.1", "A/a2/a2.2"));
+assertGeneratedNames("{A, B}", "{A/{a1, a2}, B}", Arrays.asList("A/a1", 
"A/a2", "B"));
+assertGeneratedNames(
+"{A, B, C}", "{A/{a1, a2}, B, C/c/cc}", Arrays.asList("A/a1", "A/a2", 
"B", "C/c/cc"));
 
 Review comment:
   It seems inconsistent that we don't get `A{a1/a2}, B, C` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172354)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172353
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239130429
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
+  } else {
+// Everything is in the same outer stage, enumerate at one level down.
+return String.format(
+"%s/%s", outer.getKey(), 
generateNameFromTransformNames(outer.getValue(), truncate));
+  }
+} else {
+  Collection parts;
+  if (truncate) {
+// Enumerate the outer stages without their composite structure, if 
any.
+parts = groupByOuter.keySet();
 
 Review comment:
   Note to self: Recursion anchor.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172353)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172350
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239105166
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
 
 Review comment:
   Would it make sense to add a short JavaDoc comment, explaining what kind of 
modes this method offers for generating a name from the transforms?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172350)
Time Spent: 2h 40m  (was: 2.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172351
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239130260
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
 
 Review comment:
   Note to self: Recursion anchor.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172351)
Time Spent: 2h 40m  (was: 2.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172348
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239114817
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
 
 Review comment:
   Would `collapseComposites` be a better name than `truncate`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172348)
Time Spent: 2.5h  (was: 2h 20m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172352
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239104348
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
 
 Review comment:
   I suppose names can never contain slashes or they are escaped?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172352)
Time Spent: 2h 50m  (was: 2h 40m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172355
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 16:10
Start Date: 05/Dec/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7208: 
[BEAM-5859] Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208#discussion_r239128952
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 ##
 @@ -45,19 +53,60 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  public static String generateNameFromTransformNames(Collection 
names, boolean truncate) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
+  } else {
+// Everything is in the same outer stage, enumerate at one level down.
+return String.format(
+"%s/%s", outer.getKey(), 
generateNameFromTransformNames(outer.getValue(), truncate));
 
 Review comment:
   We go down a level here but only if we had multiple elements inside the same 
composite transform and no other composite transforms.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172355)
Time Spent: 3h  (was: 2h 50m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172279
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 05/Dec/18 13:53
Start Date: 05/Dec/18 13:53
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #7208: [BEAM-5859] 
Better handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208
 
 
   I decided to default to truncated names because they both look
   nicer and they have more information if the UI is going to
   (implicitly or explicitly) truncate them on length anyways.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172279)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve Traceability o

[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171679
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 19:10
Start Date: 03/Dec/18 19:10
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
index c9bb4f6ea030..09d1a991c279 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
@@ -39,4 +39,25 @@ public static ExecutableStagePayload 
getExecutableStagePayload(
 checkArgument(ExecutableStage.URN.equals(transform.getSpec().getUrn()));
 return ExecutableStagePayload.parseFrom(transform.getSpec().getPayload());
   }
+
+  public static String generateNameFromStagePayload(ExecutableStagePayload 
stagePayload) {
+StringBuilder sb = new StringBuilder();
+RunnerApi.Components components = stagePayload.getComponents();
+final int transformsCount = stagePayload.getTransformsCount();
+sb.append("[").append(transformsCount).append("]");
+sb.append("{");
+for (int i = 0; i < transformsCount; i++) {
+  String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
+  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
+  name = name.replaceFirst("^ref_AppliedPTransform_", "");
+  // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
+  name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
+  sb.append(name);
+  if (i + 1 < transformsCount) {
+sb.append(", ");
+  }
+}
+sb.append("}");
+return sb.toString();
+  }
 }
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
new file mode 100644
index ..c5b06f76db04
--- /dev/null
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import static org.hamcrest.core.Is.is;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.junit.Test;
+
+/** Tests for {@link ExecutableStageTranslation}. */
+public class ExecutableStageTranslationTest implements Serializable {
+
+  @Test
+  /* Test for generating readable operator names during translation. */
+  public void testOperatorNameGeneration() throws Exception {
+Pipeline p = Pipeline.create();
+p.apply(Impulse.create())
+// Anonymous ParDo
+.apply(
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public 

[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171619
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 15:40
Start Date: 03/Dec/18 15:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238318772
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java
 ##
 @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy(
   e);
 }
   }
+
+  public static String genOperatorNameFromStagePayload(
 
 Review comment:
   Perhaps `ExecutableStageTranslation` would be a good place to put this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171619)
Time Spent: 2h  (was: 1h 50m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171618
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 15:38
Start Date: 03/Dec/18 15:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238318117
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java
 ##
 @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy(
   e);
 }
   }
+
+  public static String genOperatorNameFromStagePayload(
+  RunnerApi.ExecutableStagePayload stagePayload) {
+StringBuilder sb = new StringBuilder();
+final int transformsCount = stagePayload.getTransformsCount();
+sb.append("[").append(transformsCount).append("]");
+sb.append("{");
+for (int i = 0; i < transformsCount; i++) {
+  String name = stagePayload.getTransforms(i);
+  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
 
 Review comment:
   Good suggestion. The unique name doesn't include the id but makes it more 
readable: 
   
   ```
   // Master
   beam:env:docker:v1:1
   // Old PR version
   [5]{read/Read/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_14, 
read/Read/Reshuffle/RemoveRandomKeys_15, read/Read/ReadSplits_16, split_17, 
pair_with_one_18}
   // Using unique_name
   [5]{read/Read/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps), 
read/Read/Reshuffle/RemoveRandomKeys, read/Read/ReadSplits, split, 
pair_with_one}
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171618)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171552
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 12:42
Start Date: 03/Dec/18 12:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238253133
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java
 ##
 @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy(
   e);
 }
   }
+
+  public static String genOperatorNameFromStagePayload(
+  RunnerApi.ExecutableStagePayload stagePayload) {
+StringBuilder sb = new StringBuilder();
+final int transformsCount = stagePayload.getTransformsCount();
+sb.append("[").append(transformsCount).append("]");
+sb.append("{");
+for (int i = 0; i < transformsCount; i++) {
+  String name = stagePayload.getTransforms(i);
+  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
 
 Review comment:
   It would probably be better to use 
stagePayload.components.transforms[name].unique_name rather than its id here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171552)
Time Spent: 1h 40m  (was: 1.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171551
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 12:42
Start Date: 03/Dec/18 12:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238252456
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/FlinkPipelineTranslatorUtils.java
 ##
 @@ -77,4 +77,25 @@ public static WindowingStrategy getWindowingStrategy(
   e);
 }
   }
+
+  public static String genOperatorNameFromStagePayload(
 
 Review comment:
   This isn't really Flink-specific. I wonder if it'd be better placed 
somewhere in core-construction. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171551)
Time Spent: 1.5h  (was: 1h 20m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171533
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 10:59
Start Date: 03/Dec/18 10:59
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#issuecomment-443670703
 
 
   >New naming seems more intuitive albeit slightly increased real estate:
   
   Besides the increased screen real estate, they actually list more 
information about the stages in the harness. Those were hidden before.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171533)
Time Spent: 1h 10m  (was: 1h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171536
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 11:17
Start Date: 03/Dec/18 11:17
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#issuecomment-443675464
 
 
   We can make the Python pipelines even more readable by removing 
`ref_AppliedPTransform_` prefix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171536)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171530
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 10:56
Start Date: 03/Dec/18 10:56
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238223008
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/translation/FlinkPipelineTranslatorUtilsTest.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink.translation;
+
+import static org.hamcrest.core.Is.is;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import 
org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.junit.Test;
+
+/**
+ * Tests for {@link 
org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils}.
+ */
+public class FlinkPipelineTranslatorUtilsTest implements Serializable {
+
+  @Test
+  /* Test for generating readable operator names during translation. */
+  public void testOperatorNameGeneration() throws Exception {
+Pipeline p = Pipeline.create();
+p.apply(Impulse.create())
+// Anonymous ParDo
+.apply(
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public void processElement(
+  ProcessContext processContext, OutputReceiver 
outputReceiver) {}
+}))
+// Name ParDo
+.apply(
+"MyName",
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public void processElement(
+  ProcessContext processContext, OutputReceiver 
outputReceiver) {}
+}));
+
+ExecutableStage firstEnvStage =
+GreedyPipelineFuser.fuse(PipelineTranslation.toProto(p))
+.getFusedStages()
+.stream()
+.findFirst()
+.get();
+RunnerApi.ExecutableStagePayload basePayload =
+RunnerApi.ExecutableStagePayload.parseFrom(
+firstEnvStage.toPTransform("foo").getSpec().getPayload());
+
+String executableStageName =
+
FlinkPipelineTranslatorUtils.genOperatorNameFromStagePayload(basePayload);
+
+assertThat(executableStageName, 
is("ExecutableStage(2)[ParDo(Anonymous)][MyName]"));
 
 Review comment:
   I think we could skip the word `ExecutableStage` entirely. Maybe just 
enclosing it in `{}` would suffice to indicate that a harness is involved.
   
   I realized the names are less readable for Python. In Java we would just get 
the user-defined name (if set).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171530)
Time Spent: 1h  (was: 50m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Maj

[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171456
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 04:18
Start Date: 03/Dec/18 04:18
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #7150: 
[BEAM-5859] Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#discussion_r238141279
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/translation/FlinkPipelineTranslatorUtilsTest.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink.translation;
+
+import static org.hamcrest.core.Is.is;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import 
org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.junit.Test;
+
+/**
+ * Tests for {@link 
org.apache.beam.runners.flink.translation.utils.FlinkPipelineTranslatorUtils}.
+ */
+public class FlinkPipelineTranslatorUtilsTest implements Serializable {
+
+  @Test
+  /* Test for generating readable operator names during translation. */
+  public void testOperatorNameGeneration() throws Exception {
+Pipeline p = Pipeline.create();
+p.apply(Impulse.create())
+// Anonymous ParDo
+.apply(
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public void processElement(
+  ProcessContext processContext, OutputReceiver 
outputReceiver) {}
+}))
+// Name ParDo
+.apply(
+"MyName",
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public void processElement(
+  ProcessContext processContext, OutputReceiver 
outputReceiver) {}
+}));
+
+ExecutableStage firstEnvStage =
+GreedyPipelineFuser.fuse(PipelineTranslation.toProto(p))
+.getFusedStages()
+.stream()
+.findFirst()
+.get();
+RunnerApi.ExecutableStagePayload basePayload =
+RunnerApi.ExecutableStagePayload.parseFrom(
+firstEnvStage.toPTransform("foo").getSpec().getPayload());
+
+String executableStageName =
+
FlinkPipelineTranslatorUtils.genOperatorNameFromStagePayload(basePayload);
+
+assertThat(executableStageName, 
is("ExecutableStage(2)[ParDo(Anonymous)][MyName]"));
 
 Review comment:
   Can we skip or shorten `ExecutableStage`? In the metric backend system 
characters usually come at a premium and it does not really add anything 
valuable for the user.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171456)
Time Spent: 50m  (was: 40m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, w

[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171447
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 03:48
Start Date: 03/Dec/18 03:48
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#issuecomment-443580831
 
 
   Here is an example before and after:
   
   
![image](https://user-images.githubusercontent.com/263695/49351978-a0d3c700-f66a-11e8-9644-8bcc5bf3d64a.png)
   
   New naming seems more intuitive albeit slightly increased real estate:
   
   
![image](https://user-images.githubusercontent.com/263695/49352025-d11b6580-f66a-11e8-9ee2-bd5f0b1395e9.png)




This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171447)
Time Spent: 40m  (was: 0.5h)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-11-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170358
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 28/Nov/18 18:37
Start Date: 28/Nov/18 18:37
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#issuecomment-442556971
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 170358)
Time Spent: 0.5h  (was: 20m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-11-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170334
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 28/Nov/18 17:32
Start Date: 28/Nov/18 17:32
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150#issuecomment-442534708
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 170334)
Time Spent: 20m  (was: 10m)

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph without any trace of the original Beam 
> pipeline:
> WordCount:
>  !wordcount.png! 
> TFX:
>  !tfx.png! 
> Some aspects which make understanding these graphs hard:
>  * Users don't know how the Runner maps Beam to Flink concepts
>  * The UI is awfully slow / hangs when the pipeline is reasonable complex
>  * The operator names seem to use {{transform.getUniqueName()}} which doesn't 
> generate readable name
>  * So called Chaining combines operators into a single operator which makes 
> understanding which Beam concept belongs to which Flink concept even harder
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-11-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=170295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-170295
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 28/Nov/18 16:33
Start Date: 28/Nov/18 16:33
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #7150: [BEAM-5859] 
Improve operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150
 
 
   This adds a more readable operator name for executable stages. It is of the
   form:
   
   ExecutableStage(numTransforms)[transformName][transformName2]..
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 170295)
Time Spent: 10m
Remaining Estimate: 0h

> Improve Traceability of Pipeline translation
> 
>
> Key: BEAM-5859
> URL: https://issues.apache.org/jira/browse/BEAM-5859
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: tfx.png, wordcount.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Users often ask how they can reason about the pipeline translation. The Flink 
> UI display a confusingly large graph withou