[jira] [Work logged] (BEAM-6155) Migrate the Go SDK to the modern GCS library

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6155?focusedWorklogId=172911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172911
 ]

ASF GitHub Bot logged work on BEAM-6155:


Author: ASF GitHub Bot
Created on: 07/Dec/18 02:03
Start Date: 07/Dec/18 02:03
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #7182: [BEAM-6155] Updates 
the GCS library the Go SDK uses.
URL: https://github.com/apache/beam/pull/7182
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go 
b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go
index dede7a51c1af..a59b81d93ffc 100644
--- a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go
+++ b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go
@@ -19,11 +19,11 @@ import (
"fmt"
"io"
 
+   "cloud.google.com/go/storage"
pb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1"
"github.com/apache/beam/sdks/go/pkg/beam/util/gcsx"
"github.com/golang/protobuf/proto"
"golang.org/x/net/context"
-   "google.golang.org/api/storage/v1"
 )
 
 // RetrievalServer is a artifact retrieval server backed by Google
@@ -41,7 +41,7 @@ func ReadProxyManifest(ctx context.Context, object string) 
(*pb.ProxyManifest, e
return nil, fmt.Errorf("invalid manifest object %v: %v", 
object, err)
}
 
-   cl, err := gcsx.NewClient(ctx, storage.DevstorageReadOnlyScope)
+   cl, err := gcsx.NewClient(ctx, storage.ScopeReadOnly)
if err != nil {
return nil, fmt.Errorf("failed to create GCS client: %v", err)
}
@@ -88,22 +88,22 @@ func (s *RetrievalServer) GetArtifact(req 
*pb.GetArtifactRequest, stream pb.Arti
 
bucket, object := parseObject(blob)
 
-   client, err := gcsx.NewClient(stream.Context(), 
storage.DevstorageReadOnlyScope)
+   client, err := gcsx.NewClient(stream.Context(), storage.ScopeReadOnly)
if err != nil {
return fmt.Errorf("Failed to create client for %v: %v", key, 
err)
}
 
// Stream artifact in up to 1MB chunks.
-
-   resp, err := client.Objects.Get(bucket, object).Download()
+   ctx := context.TODO()
+   r, err := client.Bucket(bucket).Object(object).NewReader(ctx)
if err != nil {
return fmt.Errorf("Failed to read object for %v: %v", key, err)
}
-   defer resp.Body.Close()
+   defer r.Close()
 
data := make([]byte, 1<<20)
for {
-   n, err := resp.Body.Read(data)
+   n, err := r.Read(data)
if n > 0 {
if err := stream.Send({Data: 
data[:n]}); err != nil {
return fmt.Errorf("chunk send failed: %v", err)
diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go 
b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go
index 51ffec6fe129..109eb29cdb43 100644
--- a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go
+++ b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go
@@ -26,11 +26,11 @@ import (
"path"
"sync"
 
+   "cloud.google.com/go/storage"
pb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1"
"github.com/apache/beam/sdks/go/pkg/beam/util/gcsx"
"github.com/golang/protobuf/proto"
"golang.org/x/net/context"
-   "google.golang.org/api/storage/v1"
 )
 
 // StagingServer is a artifact staging server backed by Google Cloud Storage
@@ -81,7 +81,7 @@ func (s *StagingServer) CommitManifest(ctx context.Context, 
req *pb.CommitManife
return nil, fmt.Errorf("failed to marshal proxy manifest: %v", 
err)
}
 
-   cl, err := gcsx.NewClient(ctx, storage.DevstorageReadWriteScope)
+   cl, err := gcsx.NewClient(ctx, storage.ScopeReadWrite)
if err != nil {
return nil, fmt.Errorf("failed to create GCS client: %v", err)
}
@@ -135,7 +135,7 @@ func (s *StagingServer) PutArtifact(ps 
pb.ArtifactStagingService_PutArtifactServ
// Stream content to GCS. We don't have to worry about partial
// or abandoned writes, because object writes are atomic.
 
-   cl, err := gcsx.NewClient(ps.Context(), 
storage.DevstorageReadWriteScope)
+   cl, err := gcsx.NewClient(ps.Context(), storage.ScopeReadWrite)
if err != nil {
return fmt.Errorf("failed to create GCS client: %v", err)
}
diff --git a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go 
b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go
index 9a7b265796d7..356c718a024d 100644
--- a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go
+++ b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go
@@ 

[jira] [Created] (BEAM-6192) Release related scripts should be more resilient to failures

2018-12-06 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-6192:


 Summary: Release related scripts should be more resilient to 
failures
 Key: BEAM-6192
 URL: https://issues.apache.org/jira/browse/BEAM-6192
 Project: Beam
  Issue Type: Improvement
  Components: project-management, testing
Reporter: Chamikara Jayalath
Assignee: Boyuan Zhang


Currently we have a number of great release related scripts that are mentioned 
in Beam release guide.

[https://beam.apache.org/contribute/release-guide/]

 

These tools are great but I think given the length of these scripts, we should 
make them more resilient to error conditions and make them retry commands 
(after user action) as much as possible.

For example, RC creation script might fail for a number of reasons (it failed 
for me due to tox not being available for example) and halt at an intermediate 
state (some files staged, some commits done to Github but not complete). 
Re-running script fails due to previous state. After this it is up to the 
release manager to figure out where the script failed, do cleanups, and resume 
the process.

 

Also, have we considered using some release automation tooling instead of shell 
scripts ?

 

[~boyuanz] assigning this to you since you developed some of these. Please feel 
free to re-direct.

 

cc: [~altay]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172909
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 07/Dec/18 01:37
Start Date: 07/Dec/18 01:37
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
 
b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
index 748efb47005a..f53257fe07f6 100644
--- 
a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
+++ 
b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
@@ -21,6 +21,7 @@
 import static org.hamcrest.Matchers.containsInAnyOrder;
 import static org.hamcrest.Matchers.equalTo;
 import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
 import static org.junit.Assert.fail;
 
 import com.google.common.base.Optional;
@@ -278,6 +279,91 @@ public void process(ProcessContext ctxt) {
 }
   }
 
+  @Test
+  public void testBundleProcessorThrowsExecutionExceptionWhenUserCodeThrows() 
throws Exception {
+Pipeline p = Pipeline.create();
+p.apply("impulse", Impulse.create())
+.apply(
+"create",
+ParDo.of(
+new DoFn>() {
+  @ProcessElement
+  public void process(ProcessContext ctxt) throws Exception {
+String element =
+CoderUtils.decodeFromByteArray(StringUtf8Coder.of(), 
ctxt.element());
+if (element.equals("X")) {
+  throw new Exception("testBundleExecutionFailure");
+}
+ctxt.output(KV.of(element, element));
+  }
+}))
+.apply("gbk", GroupByKey.create());
+
+RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p);
+FusedPipeline fused = GreedyPipelineFuser.fuse(pipelineProto);
+checkState(fused.getFusedStages().size() == 1, "Expected exactly one fused 
stage");
+ExecutableStage stage = fused.getFusedStages().iterator().next();
+
+ExecutableProcessBundleDescriptor descriptor =
+ProcessBundleDescriptors.fromExecutableStage(
+"my_stage", stage, dataServer.getApiServiceDescriptor());
+
+BundleProcessor processor =
+controlClient.getProcessor(
+descriptor.getProcessBundleDescriptor(), 
descriptor.getRemoteInputDestinations());
+Map>> outputTargets = 
descriptor.getOutputTargetCoders();
+Map>> outputValues = new 
HashMap<>();
+Map> outputReceivers = new HashMap<>();
+for (Entry>> targetCoder : 
outputTargets.entrySet()) {
+  List> outputContents =
+  Collections.synchronizedList(new ArrayList<>());
+  outputValues.put(targetCoder.getKey(), outputContents);
+  outputReceivers.put(
+  targetCoder.getKey(),
+  RemoteOutputReceiver.of(
+  (Coder) targetCoder.getValue(),
+  (FnDataReceiver>) outputContents::add));
+}
+
+try (ActiveBundle bundle =
+processor.newBundle(outputReceivers, BundleProgressHandler.ignored())) 
{
+  Iterables.getOnlyElement(bundle.getInputReceivers().values())
+  .accept(
+  WindowedValue.valueInGlobalWindow(
+  CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "Y")));
+}
+
+try {
+  try (ActiveBundle bundle =
+  processor.newBundle(outputReceivers, 
BundleProgressHandler.ignored())) {
+Iterables.getOnlyElement(bundle.getInputReceivers().values())
+.accept(
+WindowedValue.valueInGlobalWindow(
+CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "X")));
+  }
+  // Fail the test if we reach this point and never threw the exception.
+  fail();
+} catch (ExecutionException e) {
+  assertTrue(e.getMessage().contains("testBundleExecutionFailure"));
+}
+
+try (ActiveBundle bundle =
+processor.newBundle(outputReceivers, BundleProgressHandler.ignored())) 
{
+  Iterables.getOnlyElement(bundle.getInputReceivers().values())
+  .accept(
+  WindowedValue.valueInGlobalWindow(

[jira] [Work logged] (BEAM-6155) Migrate the Go SDK to the modern GCS library

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6155?focusedWorklogId=172905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172905
 ]

ASF GitHub Bot logged work on BEAM-6155:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:59
Start Date: 07/Dec/18 00:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7182: [BEAM-6155] Updates 
the GCS library the Go SDK uses.
URL: https://github.com/apache/beam/pull/7182#issuecomment-445085546
 
 
   Run Go PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172905)
Time Spent: 20m  (was: 10m)

> Migrate the Go SDK to the modern GCS library
> 
>
> Key: BEAM-6155
> URL: https://issues.apache.org/jira/browse/BEAM-6155
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The gcsx package is using the google.golang.org/api/storage/v1 GCS library. 
> That library has been deprecated for ~6 months, and the recommendation is to 
> use the newer 
> [cloud.google.com/go/storage|https://godoc.org/cloud.google.com/go/storage] 
> package. That package supports newer features, and has built in connection 
> pooling, timeout support, retry with exponential backoff, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172902
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:27
Start Date: 07/Dec/18 00:27
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on issue #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#issuecomment-445079897
 
 
   Your high-level understanding is correct. In isolation, using a beam version 
property works just fine. The problem is when you want to set up another BOM 
that specifies the modules of Beam. In that case, it's not sustainable for that 
aggregated BOM to specify each module of Beam; it's much more sustainable for 
Beam to generate a BOM with Beam's module versions, which adds dependency 
declarations automatically when new modules are added to Beam, and then have 
the aggregated BOM just import the Beam BOM. We are trying to set up such an 
aggregator BOM at 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/blob/master/boms/cloud-oss-bom/pom.xml
 .
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172902)
Time Spent: 2h 20m  (was: 2h 10m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172901
 ]

ASF GitHub Bot logged work on BEAM-5321:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:26
Start Date: 07/Dec/18 00:26
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #7104: [BEAM-5321] Port 
transforms package to Python 3
URL: https://github.com/apache/beam/pull/7104
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/transforms/ptransform_test.py 
b/sdks/python/apache_beam/transforms/ptransform_test.py
index 95ea03f2ba28..ffb43317e221 100644
--- a/sdks/python/apache_beam/transforms/ptransform_test.py
+++ b/sdks/python/apache_beam/transforms/ptransform_test.py
@@ -668,7 +668,7 @@ def test_chained_ptransforms(self):
   def test_apply_to_list(self):
 self.assertCountEqual(
 [1, 2, 3], [0, 1, 2] | 'AddOne' >> beam.Map(lambda x: x + 1))
-self.assertItemsEqual([1],
+self.assertCountEqual([1],
   [0, 1, 2] | 'Odd' >> beam.Filter(lambda x: x % 2))
 self.assertCountEqual([1, 2, 100, 3],
   ([1, 2, 3], [100]) | beam.Flatten())
@@ -947,7 +947,7 @@ def process(self, element, prefix):
| 'Upper' >> beam.ParDo(ToUpperCaseWithPrefix(), 'hello'))
 
 self.assertEqual("Type hint violation for 'Upper': "
- "requires  but got  for element",
+ "requires {} but got {} for element".format(str, int),
  e.exception.args[0])
 
   def test_do_fn_pipeline_runtime_type_check_satisfied(self):
@@ -982,7 +982,7 @@ def process(self, element, num):
   self.p.run()
 
 self.assertEqual("Type hint violation for 'Add': "
- "requires  but got  for element",
+ "requires {} but got {} for element".format(int, str),
  e.exception.args[0])
 
   def test_pardo_does_not_type_check_using_type_hint_decorators(self):
@@ -999,7 +999,7 @@ def int_to_str(a):
| 'ToStr' >> beam.FlatMap(int_to_str))
 
 self.assertEqual("Type hint violation for 'ToStr': "
- "requires  but got  for a",
+ "requires {} but got {} for a".format(int, str),
  e.exception.args[0])
 
   def test_pardo_properly_type_checks_using_type_hint_decorators(self):
@@ -1031,7 +1031,7 @@ def 
test_pardo_does_not_type_check_using_type_hint_methods(self):
   .with_input_types(str).with_output_types(str)))
 
 self.assertEqual("Type hint violation for 'Upper': "
- "requires  but got  for x",
+ "requires {} but got {} for x".format(str, int),
  e.exception.args[0])
 
   def test_pardo_properly_type_checks_using_type_hint_methods(self):
@@ -1056,7 +1056,7 @@ def 
test_map_does_not_type_check_using_type_hints_methods(self):
.with_input_types(str).with_output_types(str))
 
 self.assertEqual("Type hint violation for 'Upper': "
- "requires  but got  for x",
+ "requires {} but got {} for x".format(str, int),
  e.exception.args[0])
 
   def test_map_properly_type_checks_using_type_hints_methods(self):
@@ -1082,7 +1082,7 @@ def upper(s):
| 'Upper' >> beam.Map(upper))
 
 self.assertEqual("Type hint violation for 'Upper': "
- "requires  but got  for s",
+ "requires {} but got {} for s".format(str, int),
  e.exception.args[0])
 
   def test_map_properly_type_checks_using_type_hints_decorator(self):
@@ -1109,7 +1109,7 @@ def 
test_filter_does_not_type_check_using_type_hints_method(self):
| 'Below 3' >> beam.Filter(lambda x: x < 3).with_input_types(int))
 
 self.assertEqual("Type hint violation for 'Below 3': "
- "requires  but got  for x",
+ "requires {} but got {} for x".format(int, str),
  e.exception.args[0])
 
   def test_filter_type_checks_using_type_hints_method(self):
@@ -1134,7 +1134,7 @@ def more_than_half(a):
| 'Half' >> beam.Filter(more_than_half))
 
 self.assertEqual("Type hint violation for 'Half': "
- "requires  but got  for a",
+ "requires {} but got {} for a".format(float, int),
  e.exception.args[0])
 
   def test_filter_type_checks_using_type_hints_decorator(self):
@@ -1183,7 +1183,7 @@ def test_group_by_key_only_does_not_type_check(self):
 
 self.assertEqual("Input type hint violation at F: "
  "expected 

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172899
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:24
Start Date: 07/Dec/18 00:24
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on a change in pull 
request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag 
for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239662361
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
+  def artifacts = []
+  archives.each {
+if (it.file =~ 'jar') {
+  // We can't just call `archives.remove(it)` here because it triggers
+  // a `ConcurrentModificationException`, so we add matching artifacts
+  // to another list, then remove those elements outside of this iteration.
+  artifacts.add(it)
+}
+  }
+  artifacts.each {
+archives.remove(it)
+  }
+}
+
+artifacts {
+  archives(mavenJavaBomOutputFile) {
+builtBy copyPom
+  }
+}
+
+afterEvaluate {
+  // We can't use the `publishing` section from applyJavaNature because
+  // we don't want all the Java artifacts, and we want to use our own pom.xml
+  // instead of the generated one.
+  publishing {
+publications {
+  mavenJava(MavenPublication) {
 
 Review comment:
   I think it should become the recommended method. I was thinking that the 
archetypes could be modified in a following PR. (FYI I tried it out on a 
project generated from an archetype locally and it worked great.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172899)
Time Spent: 2h 10m  (was: 2h)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue 

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172897
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:22
Start Date: 07/Dec/18 00:22
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on a change in pull 
request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag 
for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239662081
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
 
 Review comment:
   It is not sufficient unfortunately (as I discovered when implementing bom 
support in gax-java). If I remember correctly, the problem is that there is 
something else in the build process that still expects the jar archive, so when 
you only set `jar.enabled = false`, then you get an error. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172897)
Time Spent: 1h 50m  (was: 1h 40m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172898
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:22
Start Date: 07/Dec/18 00:22
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on a change in pull 
request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag 
for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239662155
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
+  def artifacts = []
+  archives.each {
+if (it.file =~ 'jar') {
+  // We can't just call `archives.remove(it)` here because it triggers
+  // a `ConcurrentModificationException`, so we add matching artifacts
+  // to another list, then remove those elements outside of this iteration.
+  artifacts.add(it)
+}
+  }
+  artifacts.each {
+archives.remove(it)
+  }
+}
+
+artifacts {
+  archives(mavenJavaBomOutputFile) {
+builtBy copyPom
+  }
+}
+
+afterEvaluate {
+  // We can't use the `publishing` section from applyJavaNature because
 
 Review comment:
   I played around with that a little but kept hitting evaluation ordering 
problems. I can try to take another crack at it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172898)
Time Spent: 2h  (was: 1h 50m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates 

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172896
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:20
Start Date: 07/Dec/18 00:20
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on a change in pull 
request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag 
for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239661750
 
 

 ##
 File path: runners/samza/build.gradle
 ##
 @@ -19,7 +19,7 @@
 import groovy.json.JsonOutput
 
 apply plugin: org.apache.beam.gradle.BeamModulePlugin
-applyJavaNature()
+applyJavaNature(exportJavadoc: false)
 
 Review comment:
   That was my thought process.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172896)
Time Spent: 1h 40m  (was: 1.5h)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172895
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:19
Start Date: 07/Dec/18 00:19
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on a change in pull 
request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag 
for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239661705
 
 

 ##
 File path: sdks/java/javadoc/build.gradle
 ##
 @@ -26,64 +26,33 @@
 description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc"
 apply plugin: 'java'
 
-def exportedJavadocProjects = [
-  ':beam-runners-apex',
-  ':beam-runners-core-construction-java',
-  ':beam-runners-core-java',
-  ':beam-runners-java-fn-execution',
-  ':beam-runners-local-java-core',
-  ':beam-runners-direct-java',
-  ':beam-runners-reference-java',
-  ':beam-runners-flink_2.11',
-  ':beam-runners-google-cloud-dataflow-java',
-  ':beam-runners-spark',
-  ':beam-runners-gearpump',
-  ':beam-sdks-java-core',
-  ':beam-sdks-java-fn-execution',
-  ':beam-sdks-java-extensions-google-cloud-platform-core',
-  ':beam-sdks-java-extensions-join-library',
-  ':beam-sdks-java-extensions-json-jackson',
-  ':beam-sdks-java-extensions-protobuf',
-  ':beam-sdks-java-extensions-sketching',
-  ':beam-sdks-java-extensions-sorter',
-  ':beam-sdks-java-extensions-sql',
-  ':beam-sdks-java-harness',
-  ':beam-sdks-java-io-amazon-web-services',
-  ':beam-sdks-java-io-amqp',
-  ':beam-sdks-java-io-cassandra',
-  ':beam-sdks-java-io-elasticsearch',
-  ':beam-sdks-java-io-elasticsearch-tests-2',
-  ':beam-sdks-java-io-elasticsearch-tests-5',
-  ':beam-sdks-java-io-elasticsearch-tests-6',
-  ':beam-sdks-java-io-elasticsearch-tests-common',
-  ':beam-sdks-java-io-google-cloud-platform',
-  ':beam-sdks-java-io-hadoop-common',
-  ':beam-sdks-java-io-hadoop-file-system',
-  ':beam-sdks-java-io-hadoop-input-format',
-  ':beam-sdks-java-io-hbase',
-  ':beam-sdks-java-io-hcatalog',
-  ':beam-sdks-java-io-jdbc',
-  ':beam-sdks-java-io-jms',
-  ':beam-sdks-java-io-kafka',
-  ':beam-sdks-java-io-kinesis',
-  ':beam-sdks-java-io-mongodb',
-  ':beam-sdks-java-io-mqtt',
-  ':beam-sdks-java-io-parquet',
-  ':beam-sdks-java-io-redis',
-  ':beam-sdks-java-io-solr',
-  ':beam-sdks-java-io-tika',
-  ':beam-sdks-java-io-xml',
-]
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name) && 
!projName.equals(':beam-sdks-java-bom')) {
+evaluationDependsOn(projName)
 
 Review comment:
   The opposite of your first sentence is what is happening - this project 
depends on every other project. But anyway, the rest of what you say seems 
true, you can encounter cycles. I actually did encounter a cycle because 
`beam-sdks-java-bom` also uses `evaluationDependsOn` for all modules. When 
there is a cycle, you just need to determine the order of precedence. My 
decision was all modules -> javadoc -> bom. That is why if you look down 
further in this file, you will see that `:beam-sdks-java-bom` is skipped. Thus, 
you can never have javadoc for the bom, but the bom can include the javadoc 
module. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172895)
Time Spent: 1.5h  (was: 1h 20m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172891
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:07
Start Date: 07/Dec/18 00:07
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #7202: [BEAM-6181] 
Reporting user counters via MonitoringInfos in Portable Dataflow Runner.
URL: https://github.com/apache/beam/pull/7202
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
index 2d840e3f4356..28f239784bf7 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
@@ -263,6 +263,7 @@ private synchronized WorkItemStatus 
createStatusUpdate(boolean isFinal) {
 return status;
   }
 
+  // todo(migryz) this method should return List instead of 
updating member variable
   @VisibleForTesting
   synchronized void populateCounterUpdates(WorkItemStatus status) {
 if (worker == null) {
@@ -270,13 +271,18 @@ synchronized void populateCounterUpdates(WorkItemStatus 
status) {
 }
 
 boolean isFinalUpdate = Boolean.TRUE.equals(status.getCompleted());
-ImmutableList.Builder counterUpdatesBuilder = 
ImmutableList.builder();
-counterUpdatesBuilder.addAll(extractCounters(worker.getOutputCounters()));
-counterUpdatesBuilder.addAll(extractMetrics(isFinalUpdate));
-counterUpdatesBuilder.addAll(extractMsecCounters(isFinalUpdate));
-counterUpdatesBuilder.addAll(worker.extractMetricUpdates());
 
-ImmutableList counterUpdates = 
counterUpdatesBuilder.build();
+ImmutableList.Builder counterUpdatesListBuilder = 
ImmutableList.builder();
+// Output counters
+
counterUpdatesListBuilder.addAll(extractCounters(worker.getOutputCounters()));
+// User metrics reported in Worker
+counterUpdatesListBuilder.addAll(extractMetrics(isFinalUpdate));
+// MSec counters reported in worker
+counterUpdatesListBuilder.addAll(extractMsecCounters(isFinalUpdate));
+// Metrics reported in SDK runner.
+counterUpdatesListBuilder.addAll(worker.extractMetricUpdates());
+
+ImmutableList counterUpdates = 
counterUpdatesListBuilder.build();
 status.setCounterUpdates(counterUpdates);
   }
 
diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
index bc0fb54d9521..6c1d43f952f9 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
@@ -17,20 +17,25 @@
  */
 package org.apache.beam.runners.dataflow.worker.fn.control;
 
+import com.google.api.services.dataflow.model.CounterMetadata;
+import com.google.api.services.dataflow.model.CounterStructuredName;
+import com.google.api.services.dataflow.model.CounterStructuredNameAndMetadata;
 import com.google.api.services.dataflow.model.CounterUpdate;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Preconditions;
-import com.google.common.collect.FluentIterable;
 import com.google.common.collect.Iterables;
 import io.opencensus.common.Scope;
 import io.opencensus.trace.SpanBuilder;
 import io.opencensus.trace.Tracer;
 import io.opencensus.trace.Tracing;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.NoSuchElementException;
+import java.util.Objects;
 import java.util.concurrent.CancellationException;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executors;
@@ -39,20 +44,27 @@
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicReference;
 import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
 import javax.annotation.Nullable;
 import 

[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172892
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 07/Dec/18 00:08
Start Date: 07/Dec/18 00:08
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-445076183
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172892)
Time Spent: 16h  (was: 15h 50m)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6172) Flink metrics are not generated in standard format

2018-12-06 Thread Micah Wylde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712160#comment-16712160
 ] 

Micah Wylde commented on BEAM-6172:
---

In addition to those two changes, there's an extra "group" in the name. For 
example, a group-specific native flink metric (as output from the slf4j 
reporter with default configs) looks like 

{code}
10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.numRecordsIn:
 41
{code}

while a beam one looks like
{code}
10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.__counter__group__org.apache.beam.runners.core.ReduceFnRunner__droppedDueToClosedWindow:
 41
{code}

I think ideally the beam metric would be 
{code}
10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.org.apache.beam.runners.core.ReduceFnRunner.droppedDueToClosedWindow:
 41
{code}

Our high-level goal is to be able to use the same metric parsing logic as we 
use for existing flink metrics (both internal and application-specific) and get 
reasonable results back.

> Flink metrics are not generated in standard format
> --
>
> Key: BEAM-6172
> URL: https://issues.apache.org/jira/browse/BEAM-6172
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Micah Wylde
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The metrics that the flink runner exports do not follow the standard format 
> used by Flink, and doesn't respect Flink metric configuration options. 
> For example (with the default metrics configuration) beam produces a metric:
> {code}
> 10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.group.0.__counter__group__org-apache-beam-runners-core-ReduceFnRunner__droppedDueToClosedWindow
> {code}
> whereas a native Flink metric looks like:
> {code}
> 10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.Source-Custom-Source-7Kinesis-None-beam-env-docker-v1-0-ToKeyedWorkItem.0.numRecordsOut
> {code}
> In particular, Beam should respect the 
> [metric.scope.delimiter|https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#metrics-scope-delimiter]
>  configuration for separating components of a metric (currently it uses 
> "__"), and should not include the type of metric (counter, gauge, etc.).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester

2018-12-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712155#comment-16712155
 ] 

Kenneth Knowles commented on BEAM-3660:
---

I've added you to the Contributors role so you can assign tickets to yourself 
now.

> Port ReadSpannerSchemaTest off DoFnTester
> -
>
> Key: BEAM-3660
> URL: https://issues.apache.org/jira/browse/BEAM-3660
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Evgeniy Musin
>Priority: Major
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester

2018-12-06 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3660:
-

Assignee: Evgeniy Musin

> Port ReadSpannerSchemaTest off DoFnTester
> -
>
> Key: BEAM-3660
> URL: https://issues.apache.org/jira/browse/BEAM-3660
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Evgeniy Musin
>Priority: Major
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172887
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:36
Start Date: 06/Dec/18 23:36
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-445069759
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172887)
Time Spent: 15h 40m  (was: 15.5h)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172888
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:37
Start Date: 06/Dec/18 23:37
Worklog Time Spent: 10m 
  Work Description: Ardagan edited a comment on issue #6786: [BEAM-5850] 
Add QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-445069662
 
 
   Failed precommit: 
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/468/
   This one seem to be flake.
   Will trigger another one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172888)
Time Spent: 15h 50m  (was: 15h 40m)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172886
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:36
Start Date: 06/Dec/18 23:36
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-445069662
 
 
   Failed precommit: 
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/468/
   Will trigger another one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172886)
Time Spent: 15.5h  (was: 15h 20m)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172885
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:34
Start Date: 06/Dec/18 23:34
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #7202: [BEAM-6181] Reporting 
user counters via MonitoringInfos in Portable Dataflow Runner.
URL: https://github.com/apache/beam/pull/7202#issuecomment-445069349
 
 
   Thank you Scott.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172885)
Time Spent: 6h 20m  (was: 6h 10m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172884
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:34
Start Date: 06/Dec/18 23:34
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7202: [BEAM-6181] Reporting 
user counters via MonitoringInfos in Portable Dataflow Runner.
URL: https://github.com/apache/beam/pull/7202#issuecomment-445069348
 
 
   Pre-commits before my fixup are green:
   
   
![image](https://user-images.githubusercontent.com/674021/49618183-6456e280-f96c-11e8-937c-9e5f41604c88.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172884)
Time Spent: 6h 10m  (was: 6h)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172882
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:33
Start Date: 06/Dec/18 23:33
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239653685
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -423,32 +420,28 @@ private CounterUpdate 
monitoringInfoToCounterUpdate(MonitoringInfo monitoringInf
 
 String type = monitoringInfo.getType();
 
-// todomigryz: run MonitoringInfo through validation process.
-// refer to https://github.com/apache/beam/pull/6799
-
+// todo(migryz): run MonitoringInfo through Proto validation process.
+// Requires https://github.com/apache/beam/pull/6799 to be merged.
 if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
   if (!type.equals("beam:metrics:sum_int_64")) {
-LOG.warn(
+throw new RuntimeException(
 "Ignoring user-counter MonitoringInfo with unexpected type."
 
 Review comment:
   I pushed a commit with this fixup.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172882)
Time Spent: 5h 50m  (was: 5h 40m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172883
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:33
Start Date: 06/Dec/18 23:33
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7202: [BEAM-6181] Reporting 
user counters via MonitoringInfos in Portable Dataflow Runner.
URL: https://github.com/apache/beam/pull/7202#issuecomment-445069206
 
 
   LGTM, will squash and merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172883)
Time Spent: 6h  (was: 5h 50m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172881
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:29
Start Date: 06/Dec/18 23:29
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239652977
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -423,32 +420,28 @@ private CounterUpdate 
monitoringInfoToCounterUpdate(MonitoringInfo monitoringInf
 
 String type = monitoringInfo.getType();
 
-// todomigryz: run MonitoringInfo through validation process.
-// refer to https://github.com/apache/beam/pull/6799
-
+// todo(migryz): run MonitoringInfo through Proto validation process.
+// Requires https://github.com/apache/beam/pull/6799 to be merged.
 if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
   if (!type.equals("beam:metrics:sum_int_64")) {
-LOG.warn(
+throw new RuntimeException(
 "Ignoring user-counter MonitoringInfo with unexpected type."
 
 Review comment:
   These error messages don't quite make sense for throwing exceptions. We're 
no longer ignoring them, we're failing execution. Suggestion:
   
   ```java
   throw new RuntimeException("Encountered MonitoringInfo with unexpected type. 
Expected [..]")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172881)
Time Spent: 5h 40m  (was: 5.5h)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=172880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172880
 ]

ASF GitHub Bot logged work on BEAM-6191:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:25
Start Date: 06/Dec/18 23:25
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7220: [BEAM-6191] Remove 
redundant error logging for Dataflow exception handling
URL: https://github.com/apache/beam/pull/7220#issuecomment-445067501
 
 
   R: @ajamato @foegler


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172880)
Time Spent: 20m  (was: 10m)

> Redundant error messages for failures in Dataflow runner
> 
>
> Key: BEAM-6191
> URL: https://issues.apache.org/jira/browse/BEAM-6191
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Dataflow runner harness has redundant error logging from a couple 
> different components, which creates log spam and confusion when failures do 
> occur. We should dedupe redundant logs.
> From a typical user-code exception, we see at least 3 error logs from the 
> worker:
> http://screen/QZxsJOVnvt6
> "Aborting operations"
> "Uncaught exception occurred during work unit execution. This will be 
> retried."
> "Failure processing work item"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=172879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172879
 ]

ASF GitHub Bot logged work on BEAM-6191:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:19
Start Date: 06/Dec/18 23:19
Worklog Time Spent: 10m 
  Work Description: swegner opened a new pull request #7220: [BEAM-6191] 
Remove redundant error logging for Dataflow exception handling
URL: https://github.com/apache/beam/pull/7220
 
 
   The Dataflow runner harness has redundant error logging from a couple 
different
   components, which creates log spam and confusion when failures do occur. This
   change deduplicates some of the redundant logs.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172879)

[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172877
 ]

ASF GitHub Bot logged work on BEAM-5321:


Author: ASF GitHub Bot
Created on: 06/Dec/18 23:17
Start Date: 06/Dec/18 23:17
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7104: [BEAM-5321] Port 
transforms package to Python 3
URL: https://github.com/apache/beam/pull/7104#issuecomment-445065635
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172877)
Time Spent: 2h 50m  (was: 2h 40m)

> Finish Python 3 porting for transforms module
> -
>
> Key: BEAM-5321
> URL: https://issues.apache.org/jira/browse/BEAM-5321
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-06 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-6191:
--

 Summary: Redundant error messages for failures in Dataflow runner
 Key: BEAM-6191
 URL: https://issues.apache.org/jira/browse/BEAM-6191
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow
Reporter: Scott Wegner
Assignee: Scott Wegner


The Dataflow runner harness has redundant error logging from a couple different 
components, which creates log spam and confusion when failures do occur. We 
should dedupe redundant logs.

>From a typical user-code exception, we see at least 3 error logs from the 
>worker:
http://screen/QZxsJOVnvt6

"Aborting operations"

"Uncaught exception occurred during work unit execution. This will be retried."

"Failure processing work item"




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (BEAM-4150) Standardize use of PCollection coder proto attribute

2018-12-06 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reopened BEAM-4150:
---

We need to update the SDKs and workers (and remove fallback code).

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester

2018-12-06 Thread Evgeniy Musin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712125#comment-16712125
 ] 

Evgeniy Musin commented on BEAM-3660:
-

Hello! Can I get this ticket?

> Port ReadSpannerSchemaTest off DoFnTester
> -
>
> Key: BEAM-3660
> URL: https://issues.apache.org/jira/browse/BEAM-3660
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172874
 ]

ASF GitHub Bot logged work on BEAM-6120:


Author: ASF GitHub Bot
Created on: 06/Dec/18 22:50
Start Date: 06/Dec/18 22:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7127: 
[BEAM-6120] Support retrieval of large gbk iterables over the state API.
URL: https://github.com/apache/beam/pull/7127#discussion_r239644468
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -690,6 +720,35 @@ def decode_from_stream(self, in_stream, nested):
   elements.append(self._elem_coder.decode_from_stream(in_stream, True))
 count = in_stream.read_var_int64()
 
+  if count == -1:
+if self._read_state is None:
+  raise ValueError(
+  'Cannot read state-written iterable without state reader.')
+
+class FullIterable(object):
+  def __init__(self, head, tail):
+self._head = head
+self._tail = tail
+
+  def __iter__(self):
+for elem in self._head:
+  yield elem
+for elem in self._tail:
+  yield elem
+
+  def __eq__(self, other):
+return list(self) == list(other)
+
+  def __hash__(self):
+raise NotImplementedError
+
+  def __reduce__(self):
+return list, (list(self),)
+
+state_token = in_stream.read_all(True)
 
 Review comment:
   Yes, that's what the `read_all([nested=]True)` does.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172874)
Time Spent: 3h 20m  (was: 3h 10m)

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172873
 ]

ASF GitHub Bot logged work on BEAM-6120:


Author: ASF GitHub Bot
Created on: 06/Dec/18 22:48
Start Date: 06/Dec/18 22:48
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #7127: 
[BEAM-6120] Support retrieval of large gbk iterables over the state API.
URL: https://github.com/apache/beam/pull/7127#discussion_r239643988
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -637,20 +637,33 @@ class SequenceCoderImpl(StreamCoderImpl):
 countX element(0) element(1) ... element(countX - 1)
 0
 
+  If writing to state is enabled, the final terminating 0 will instead be
+  repaced with::
+
+-1
+len(state_token)
 
 Review comment:
   I would rather keep other negative values for possible future expansion. 9 
bytes is inconsequential once this code is hit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172873)
Time Spent: 3h 10m  (was: 3h)

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172867
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 21:55
Start Date: 06/Dec/18 21:55
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239629510
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   > My biggest concern is that parsing text out of the error message seems 
very brittle. Rather than risk problems there, seems better to just retry all 
es here.
   
   ApiErrorExtractor doesn't have a method for determining if the given 
exception indicates 'quota exceeded'.
   
   
https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java
   
   Is there any other class we can safely extract such information?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172867)
Time Spent: 2h 50m  (was: 2h 40m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172861
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 21:40
Start Date: 06/Dec/18 21:40
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239624631
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   @rangadi should know the exact place. I think we inject a 10 second wait 
before re-trying a failed workitem for Dataflow streaming.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172861)
Time Spent: 2h 40m  (was: 2.5h)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172856
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 21:33
Start Date: 06/Dec/18 21:33
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239622676
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   Where do you get a 10-second wait for failed workitem?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172856)
Time Spent: 2.5h  (was: 2h 20m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172840
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 21:21
Start Date: 06/Dec/18 21:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239618701
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   Error code rateLimitExceeded is well documented and probably it will be less 
brittle to check for that instead of text "Quota exceeded".
   https://cloud.google.com/bigquery/troubleshooting-errors
   
   Also, I think the main concern so far has been Beam sending large number of 
messages to BQ even after BQ service raises quota exceeded errors. I think this 
will be somewhat exacerbated by introducing exponential backoff here (more 
messages before the 10 second failed workitem wait) so this has to be combined 
with a solution where we perform exponential backoff across all BQ streaming 
write threads started by a given workitme (which can be a separate PR).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172840)
Time Spent: 2h 10m  (was: 2h)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172842
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 21:22
Start Date: 06/Dec/18 21:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239618917
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   Sorry I meant error code quotaExceeded.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172842)
Time Spent: 2h 20m  (was: 2h 10m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172830
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:57
Start Date: 06/Dec/18 20:57
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239611514
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   Failing is very expensive. If we throw an exception, not only this record, 
but all other records in the bundle will be retried. Realistically those other 
errors will be retried anyway (as the whole work item will be retried).
   
   My biggest concern is that parsing text out of the error message seems very 
brittle. Rather than risk problems there, seems better to just retry all es 
here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172830)
Time Spent: 2h  (was: 1h 50m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172831
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:59
Start Date: 06/Dec/18 20:59
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-445027620
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172831)
Time Spent: 15h 20m  (was: 15h 10m)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172828
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:50
Start Date: 06/Dec/18 20:50
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239609349
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String 
datasetId)
 try {
   return insert.execute().getInsertErrors();
 } catch (IOException e) {
-  if (new ApiErrorExtractor().rateLimited(e)) {
+  if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
 LOG.info("BigQuery insertAll exceeded rate limit, 
retrying");
-try {
-  sleeper.sleep(backoff1.nextBackOffMillis());
-} catch (InterruptedException interrupted) {
-  throw new IOException(
-  "Interrupted while waiting before retrying 
insertAll");
-}
+  } else if (ApiErrorExtractor.INSTANCE
+  .getErrorMessage(e)
+  .startsWith("Quota exceeded")) {
 
 Review comment:
   AFAIK, the worker will fail and retry on all other errors after ten seconds 
anyway. The question here is whether the given error needs to be silently (no 
explicit error log) retried with exponential backoff or not. I think it makes 
sense to use exponential backoff for `quota exceeded` and `rate limit exceeded` 
errors since they are temporal and there's a high chance of getting resolved by 
themselves in next few retrials. I'm not sure the same holds true for other 
possible errors like `field size too large`, `unauthorized`, or `user project 
missing`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172828)
Time Spent: 1h 50m  (was: 1h 40m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172824
 ]

ASF GitHub Bot logged work on BEAM-6120:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:29
Start Date: 06/Dec/18 20:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7127: 
[BEAM-6120] Support retrieval of large gbk iterables over the state API.
URL: https://github.com/apache/beam/pull/7127#discussion_r239601740
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -690,6 +720,35 @@ def decode_from_stream(self, in_stream, nested):
   elements.append(self._elem_coder.decode_from_stream(in_stream, True))
 count = in_stream.read_var_int64()
 
+  if count == -1:
+if self._read_state is None:
+  raise ValueError(
+  'Cannot read state-written iterable without state reader.')
+
+class FullIterable(object):
+  def __init__(self, head, tail):
+self._head = head
+self._tail = tail
+
+  def __iter__(self):
+for elem in self._head:
+  yield elem
+for elem in self._tail:
+  yield elem
+
+  def __eq__(self, other):
+return list(self) == list(other)
+
+  def __hash__(self):
+raise NotImplementedError
+
+  def __reduce__(self):
+return list, (list(self),)
+
+state_token = in_stream.read_all(True)
 
 Review comment:
   shouldn't this read the length of the state token from in_stream?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172824)
Time Spent: 3h  (was: 2h 50m)

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172822
 ]

ASF GitHub Bot logged work on BEAM-6120:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:29
Start Date: 06/Dec/18 20:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7127: 
[BEAM-6120] Support retrieval of large gbk iterables over the state API.
URL: https://github.com/apache/beam/pull/7127#discussion_r239593123
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -637,20 +637,33 @@ class SequenceCoderImpl(StreamCoderImpl):
 countX element(0) element(1) ... element(countX - 1)
 0
 
+  If writing to state is enabled, the final terminating 0 will instead be
+  repaced with::
+
+-1
+len(state_token)
 
 Review comment:
   important to state that len(state_token) is encoded as varint64
   
   nit: not that important but you could drop the `-1` and encode it as 
`-len(state_token)`. Will save a few bytes within the encoding since -1 will be 
9 bytes already as a varint64


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172822)
Time Spent: 2h 50m  (was: 2h 40m)

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172823
 ]

ASF GitHub Bot logged work on BEAM-6120:


Author: ASF GitHub Bot
Created on: 06/Dec/18 20:29
Start Date: 06/Dec/18 20:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7127: 
[BEAM-6120] Support retrieval of large gbk iterables over the state API.
URL: https://github.com/apache/beam/pull/7127#discussion_r239591997
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -588,6 +588,11 @@ message StandardCoders {
 // of the element
 // Components: The element coder and the window coder, in that order
 WINDOWED_VALUE = 8 [(beam_urn) = "beam:coder:windowed_value:v1"];
+
+// Encodes an iterable of elements, some of which may be stored in state.
 
 Review comment:
   Can we document the encoding here similar to TIMER?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172823)

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172807
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:49
Start Date: 06/Dec/18 19:49
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239590504
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
+// refer to https://github.com/apache/beam/pull/6799
+
+if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
+  if (!type.equals("beam:metrics:sum_int_64")) {
+LOG.warn(
 
 Review comment:
   It would be good to understand the scenarios we could reach these states and 
design error handling appropriately.
   
   If the only scenario that would produce this is a bug in the SDK, then we 
should expose that bug as early as possible to get it fixed. Log-and-continue 
means that we'll ignore it, and unless we have an explicit test our users will 
notice before we do.
   
   I doubt we have a consistent methodology for error handling in Beam, but I 
remember this came up when designing DIsplayData in the java SDK. We decided 
errors in constructing DisplayData from user-defined PTransforms should throw 
and block job submission, rather than continuing with empty Display Data. This 
feels similar. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172807)
Time Spent: 5.5h  (was: 5h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5920) Add additional OWNERS

2018-12-06 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5920.

   Resolution: Fixed
Fix Version/s: 2.10.0

> Add additional OWNERS
> -
>
> Key: BEAM-5920
> URL: https://issues.apache.org/jira/browse/BEAM-5920
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should spread knowledge and ownership of this new project. We have two 
> owners [currently 
> documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS].
>  I plan to monitor the new infrastructure closely during its initial rollout, 
> and then once it's stabilized add additional owners.
> I'd like to add additional owners in ~1 month



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172794
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239569411
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
 
 Review comment:
   I believe it would be more idiomatic to make the `copyPom` task `type: 
Copy`, and move this logic outside of the `doLast {}` block. That would move 
the `moduleNames` aggregation to the evaluation phase, which also seems 
idiomatic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172794)
Time Spent: 1h  (was: 50m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172792
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239541511
 
 

 ##
 File path: build.gradle
 ##
 @@ -281,3 +281,107 @@ release {
 requireBranch = 'release-.*|master'
   }
 }
+
+project.ext.allProjectNames = [
 
 Review comment:
   This set is already defined in 
[settings.gradle](https://github.com/apache/beam/blob/master/settings.gradle). 
Would it work to reference it via 
[`Project.allProjects`](https://docs.gradle.org/current/dsl/org.gradle.api.Project.html#org.gradle.api.Project:allprojects)
 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172792)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172797
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239578916
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
+  def artifacts = []
+  archives.each {
+if (it.file =~ 'jar') {
+  // We can't just call `archives.remove(it)` here because it triggers
+  // a `ConcurrentModificationException`, so we add matching artifacts
+  // to another list, then remove those elements outside of this iteration.
+  artifacts.add(it)
+}
+  }
+  artifacts.each {
+archives.remove(it)
+  }
+}
+
+artifacts {
+  archives(mavenJavaBomOutputFile) {
+builtBy copyPom
+  }
+}
+
+afterEvaluate {
+  // We can't use the `publishing` section from applyJavaNature because
+  // we don't want all the Java artifacts, and we want to use our own pom.xml
+  // instead of the generated one.
+  publishing {
+publications {
+  mavenJava(MavenPublication) {
 
 Review comment:
   Should consuming this BOM become the recommended way to depend on Beam? If 
so, perhaps we should implement it in the 
[maven-archetypes](https://github.com/apache/beam/tree/master/sdks/java/maven-archetypes).
 The archetype projects also get continuous testing with nightly releases, so 
that would help ensure our produced BOM is valid.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172797)
Time Spent: 1h 20m  (was: 1h 10m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> 

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172795
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239574541
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
+  def artifacts = []
+  archives.each {
+if (it.file =~ 'jar') {
+  // We can't just call `archives.remove(it)` here because it triggers
+  // a `ConcurrentModificationException`, so we add matching artifacts
+  // to another list, then remove those elements outside of this iteration.
+  artifacts.add(it)
+}
+  }
+  artifacts.each {
+archives.remove(it)
+  }
+}
+
+artifacts {
+  archives(mavenJavaBomOutputFile) {
+builtBy copyPom
+  }
+}
+
+afterEvaluate {
+  // We can't use the `publishing` section from applyJavaNature because
 
 Review comment:
   Same comment as above-- this is a lot of redundancy and I worry that as they 
are maintained over time the implementations will drift. Can we refactor the 
main publishing config to have the necessary hooks for this task?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172795)
Time Spent: 1h 10m  (was: 1h)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172789
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239572487
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
 
 Review comment:
   This introduces a second method of pom file generation and a duplicated set 
of POM boilerplate. I worry that when one gets updated we'll forget to update 
the other.
   
   Do you think we could consolidate on a single way of generating POM's?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172789)
Time Spent: 50m  (was: 40m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172788
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239566618
 
 

 ##
 File path: runners/samza/build.gradle
 ##
 @@ -19,7 +19,7 @@
 import groovy.json.JsonOutput
 
 apply plugin: org.apache.beam.gradle.BeamModulePlugin
-applyJavaNature()
+applyJavaNature(exportJavadoc: false)
 
 Review comment:
   I suspect that some of these exclusions are oversight. I appreciate that 
you've matched the existing behavior. It will be easy to follow-up later to 
enable javadocs for some of these modules which should probably have them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172788)
Time Spent: 50m  (was: 40m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172790
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239546811
 
 

 ##
 File path: sdks/java/javadoc/build.gradle
 ##
 @@ -26,64 +26,33 @@
 description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc"
 apply plugin: 'java'
 
-def exportedJavadocProjects = [
-  ':beam-runners-apex',
-  ':beam-runners-core-construction-java',
-  ':beam-runners-core-java',
-  ':beam-runners-java-fn-execution',
-  ':beam-runners-local-java-core',
-  ':beam-runners-direct-java',
-  ':beam-runners-reference-java',
-  ':beam-runners-flink_2.11',
-  ':beam-runners-google-cloud-dataflow-java',
-  ':beam-runners-spark',
-  ':beam-runners-gearpump',
-  ':beam-sdks-java-core',
-  ':beam-sdks-java-fn-execution',
-  ':beam-sdks-java-extensions-google-cloud-platform-core',
-  ':beam-sdks-java-extensions-join-library',
-  ':beam-sdks-java-extensions-json-jackson',
-  ':beam-sdks-java-extensions-protobuf',
-  ':beam-sdks-java-extensions-sketching',
-  ':beam-sdks-java-extensions-sorter',
-  ':beam-sdks-java-extensions-sql',
-  ':beam-sdks-java-harness',
-  ':beam-sdks-java-io-amazon-web-services',
-  ':beam-sdks-java-io-amqp',
-  ':beam-sdks-java-io-cassandra',
-  ':beam-sdks-java-io-elasticsearch',
-  ':beam-sdks-java-io-elasticsearch-tests-2',
-  ':beam-sdks-java-io-elasticsearch-tests-5',
-  ':beam-sdks-java-io-elasticsearch-tests-6',
-  ':beam-sdks-java-io-elasticsearch-tests-common',
-  ':beam-sdks-java-io-google-cloud-platform',
-  ':beam-sdks-java-io-hadoop-common',
-  ':beam-sdks-java-io-hadoop-file-system',
-  ':beam-sdks-java-io-hadoop-input-format',
-  ':beam-sdks-java-io-hbase',
-  ':beam-sdks-java-io-hcatalog',
-  ':beam-sdks-java-io-jdbc',
-  ':beam-sdks-java-io-jms',
-  ':beam-sdks-java-io-kafka',
-  ':beam-sdks-java-io-kinesis',
-  ':beam-sdks-java-io-mongodb',
-  ':beam-sdks-java-io-mqtt',
-  ':beam-sdks-java-io-parquet',
-  ':beam-sdks-java-io-redis',
-  ':beam-sdks-java-io-solr',
-  ':beam-sdks-java-io-tika',
-  ':beam-sdks-java-io-xml',
-]
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name) && 
!projName.equals(':beam-sdks-java-bom')) {
+evaluationDependsOn(projName)
 
 Review comment:
   Forcing every other project to depend on this one feels a bit dirty. If 
another project used the same pattern, would it create a build cycle?
   
   The previous behavior was still hacky, but it only included the projects 
listed here for javadoc support; so if there were another uber project using 
`evaluationDependsOn` it wouldn't be contained in the previous list and thus 
avoid a build cycle.
   
   However, I had a lot of trouble finding a better way of structuring this. 
Some relevant docs/threads:
   
   * [Multi-project Builds: Depending on the task output produced by another 
project](https://docs.gradle.org/current/userguide/multi_project_builds.html#sec:depending_on_output_of_another_project)
   * [Discuss: Best approach Gradle multi-module project: generate just one 
global 
javadoc](https://discuss.gradle.org/t/best-approach-gradle-multi-module-project-generate-just-one-global-javadoc/18657/22)
   * 
[gradle-aggregate-javadocs-plugin](https://github.com/nebula-plugins/gradle-aggregate-javadocs-plugin)
   
   One thought is that it seems easier to structure aggregation logic by 
putting it in the root build script. By default the root build script is 
configured first so you have some guarantees on ordering.
   
   
   With all that said, I'm happy deferring this. You've stepped into an awkward 
part of the build process, and I don't expect you to fix it here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172790)
Time Spent: 50m  (was: 40m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172796
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239573794
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
+  if (!projName.equals(':' + project.name)) {
+evaluationDependsOn(projName)
+  }
+}
+
+// Copy our pom.xml to the location where a generated POM would go
+task copyPom() {
+  doLast {
+def moduleNames = new ArrayList<>()
+for (String projName : rootProject.ext.allProjectNames) {
+  def subproject = project(projName)
+  if (subproject.ext.properties.containsKey('includeInJavaBom') &&
+  subproject.ext.properties.includeInJavaBom) {
+moduleNames.add(subproject.name)
+  }
+}
+
+new File(mavenJavaDir).mkdirs()
+copy {
+  from 'pom.xml.template'
+  into mavenJavaDir
+  rename 'pom.xml.template', 'pom-default.xml'
+  expand(version: project.version, modules: moduleNames)
+}
+  }
+}
+
+assemble.dependsOn copyPom
+
+// We want to use our own pom.xml instead of the generated one, so we disable
+// the pom.xml generation and have the publish tasks depend on `copyPom` 
instead.
+tasks.whenTaskAdded { task ->
+  if (task.name == 'generatePomFileForMavenJavaPublication') {
+task.enabled = false
+  } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') {
+task.dependsOn copyPom
+  } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') {
+task.dependsOn copyPom
+  }
+}
+
+jar.enabled = false
+
+// Remove the default jar archive which is added by the 'java' plugin.
+configurations.archives.artifacts.with { archives ->
 
 Review comment:
   Why is this needed? is the above `jar.enabled = false` not sufficient?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172796)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172793
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239567323
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
+apply plugin: "maven-publish"
+
+def isRelease(Project project) {
+  return project.hasProperty('isRelease')
+}
+
+ext {
+  mavenJavaDir = "$project.buildDir/publications/mavenJava"
+  mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml")
+}
+
+for (String projName : rootProject.ext.allProjectNames) {
 
 Review comment:
   Same comment from javadocs; can this instead use `allProjects` to remove the 
redundant `allProjectNames` array?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172793)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172791
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 19:16
Start Date: 06/Dec/18 19:16
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#discussion_r239567036
 
 

 ##
 File path: sdks/java/bom/build.gradle
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+apply plugin: "java"
 
 Review comment:
   Can you add a small comment up here about what a BOM is and why its useful? 
Feel free to link to existing context/documentation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172791)
Time Spent: 50m  (was: 40m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon

2018-12-06 Thread Dustin Rhodes (JIRA)
Dustin Rhodes created BEAM-6190:
---

 Summary: "Processing stuck" messages should be visible on Pantheon
 Key: BEAM-6190
 URL: https://issues.apache.org/jira/browse/BEAM-6190
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Affects Versions: 2.8.0
 Environment: Running on Google Cloud Dataflow
Reporter: Dustin Rhodes
Assignee: Tyler Akidau
 Fix For: Not applicable


When user processing results in an exception, it is clearly visible on the 
Pantheon landing page for a streaming Dataflow job. But when user processing 
becomes stuck, there is no indication, even though the worker logs it. Most 
users don't check worker logs and it is not that convenient to check for most 
users.  Ideally a stuck worker would result in a visible error on the Pantheon 
landing page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172780
 ]

ASF GitHub Bot logged work on BEAM-5850:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:52
Start Date: 06/Dec/18 18:52
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #6786: [BEAM-5850] Add 
QueueingBeamFnDataClient and make process, finish and start run on the same 
thread to support metrics.
URL: https://github.com/apache/beam/pull/6786#issuecomment-444985648
 
 
   @lukecwik Rebased this PR after #7214 was merged into master.
   Would you please merge this after precommit finishes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172780)
Time Spent: 15h 10m  (was: 15h)

> Make process, finish and start run on the same thread to support metrics.
> -
>
> Key: BEAM-5850
> URL: https://issues.apache.org/jira/browse/BEAM-5850
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Update BeamFnDataReceiver to place elements into a Queue and consumer then 
> and call the element processing receiver in blockTillReadFinishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5920) Add additional OWNERS

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5920?focusedWorklogId=172765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172765
 ]

ASF GitHub Bot logged work on BEAM-5920:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:34
Start Date: 06/Dec/18 18:34
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #7186: [BEAM-5920] Add 
additional owners for Community Metrics
URL: https://github.com/apache/beam/pull/7186#issuecomment-444979962
 
 
   LGTM, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172765)
Time Spent: 40m  (was: 0.5h)

> Add additional OWNERS
> -
>
> Key: BEAM-5920
> URL: https://issues.apache.org/jira/browse/BEAM-5920
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should spread knowledge and ownership of this new project. We have two 
> owners [currently 
> documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS].
>  I plan to monitor the new infrastructure closely during its initial rollout, 
> and then once it's stabilized add additional owners.
> I'd like to add additional owners in ~1 month



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172762
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:29
Start Date: 06/Dec/18 18:29
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239541582
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
+// refer to https://github.com/apache/beam/pull/6799
+
+if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
+  if (!type.equals("beam:metrics:sum_int_64")) {
+LOG.warn(
 
 Review comment:
   I believe that writing warning is good balance here. Most of the time you 
want your job to process data, and use counters for monitoring. In this 
situation usually, you do not want to lose data due to monitoring issues.
   This approach allows SDK authors to find errors in their SDK implementations 
on one hand. And not break user pipelines on the other hand.
   
   @ajamato Alex, what's your opinion on this topic?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172762)
Time Spent: 5h 20m  (was: 5h 10m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172750
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r239556933
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+
+/**
+ * An {@link EnvironmentFactory} that creates StaticRemoteEnvironment used by 
Dataflow runner
+ * harness.
+ */
+public class StaticRemoteEnvironmentFactory implements EnvironmentFactory {
+  public static StaticRemoteEnvironmentFactory forService(
+  InstructionRequestHandler instructionRequestHandler) {
+StaticRemoteEnvironmentFactory factory = new 
StaticRemoteEnvironmentFactory();
 
 Review comment:
   nit: In a follow up PR, why not define and use a constructor that takes this 
parameter. This would allow you to make the member variable final.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172750)

> Dataflow portable runner harness should use ExecutableStage to process bundle
> -
>
> Key: BEAM-6159
> URL: https://issues.apache.org/jira/browse/BEAM-6159
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172753
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r239554369
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java
 ##
 @@ -0,0 +1,449 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.graph;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static org.apache.beam.runners.dataflow.util.Structs.getBytes;
+import static org.apache.beam.runners.dataflow.util.Structs.getString;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.api.services.dataflow.model.InstructionOutput;
+import com.google.api.services.dataflow.model.MapTask;
+import com.google.api.services.dataflow.model.MultiOutputInfo;
+import com.google.api.services.dataflow.model.ParDoInstruction;
+import com.google.api.services.dataflow.model.ParallelInstruction;
+import com.google.api.services.dataflow.model.ReadInstruction;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.graph.MutableNetwork;
+import com.google.common.graph.Network;
+import java.io.IOException;
+import java.util.Base64;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.StandardPTransforms;
+import org.apache.beam.runners.core.construction.BeamUrns;
+import org.apache.beam.runners.core.construction.CoderTranslation;
+import org.apache.beam.runners.core.construction.Environments;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.core.construction.ParDoTranslation;
+import org.apache.beam.runners.core.construction.RehydratedComponents;
+import org.apache.beam.runners.core.construction.SdkComponents;
+import org.apache.beam.runners.core.construction.WindowingStrategyTranslation;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import 
org.apache.beam.runners.core.construction.graph.ImmutableExecutableStage;
+import org.apache.beam.runners.core.construction.graph.PipelineNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.apache.beam.runners.core.construction.graph.SideInputReference;
+import org.apache.beam.runners.core.construction.graph.TimerReference;
+import org.apache.beam.runners.core.construction.graph.UserStateReference;
+import org.apache.beam.runners.dataflow.util.CloudObject;
+import org.apache.beam.runners.dataflow.util.CloudObjects;
+import org.apache.beam.runners.dataflow.util.PropertyNames;
+import org.apache.beam.runners.dataflow.worker.CombinePhase;
+import org.apache.beam.runners.dataflow.worker.counters.NameContext;
+import org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge;
+import org.apache.beam.runners.dataflow.worker.graph.Edges.Edge;
+import org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge;
+import org.apache.beam.runners.dataflow.worker.graph.Nodes.ExecutableStageNode;
+import 
org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode;
+import org.apache.beam.runners.dataflow.worker.graph.Nodes.Node;
+import 
org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode;
+import 

[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172749
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r239550773
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java
 ##
 @@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.fn.control;
+
+import com.google.common.collect.Iterables;
+import java.io.Closeable;
+import 
org.apache.beam.runners.dataflow.worker.util.common.worker.OperationContext;
+import 
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver;
+import 
org.apache.beam.runners.dataflow.worker.util.common.worker.ReceivingOperation;
+import org.apache.beam.runners.fnexecution.control.BundleProgressHandler;
+import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory;
+import org.apache.beam.runners.fnexecution.control.RemoteBundle;
+import org.apache.beam.runners.fnexecution.control.StageBundleFactory;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This {@link 
org.apache.beam.runners.dataflow.worker.util.common.worker.Operation} is 
responsible
+ * for communicating with the SDK harness and asking it to process a bundle of 
work. This operation
+ * request a RemoteBundle{@link 
org.apache.beam.runners.fnexecution.control.RemoteBundle}, send data
+ * elements to SDK and receive processed results from SDK, then pass these 
elements to next
+ * Operations.
+ */
 
 Review comment:
   In a follow up PR:
   ```
* This {@link 
org.apache.beam.runners.dataflow.worker.util.common.worker.Operation} is 
responsible
* for communicating with the SDK harness and asking it to process a bundle 
of work. This operation
* requests a {@link 
org.apache.beam.runners.fnexecution.control.RemoteBundle}, sends 
* elements to SDK and receive processed results from SDK, passing these 
elements downstream.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172749)
Time Spent: 1h 10m  (was: 1h)

> Dataflow portable runner harness should use ExecutableStage to process bundle
> -
>
> Key: BEAM-6159
> URL: https://issues.apache.org/jira/browse/BEAM-6159
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172748
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r239557879
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+
+/**
+ * An {@link EnvironmentFactory} that creates StaticRemoteEnvironment used by 
Dataflow runner
 
 Review comment:
   In a follow up PR.
   This isn't specific to Dataflow, anyone should be able to use a an already 
existing InstructionRequestHandler if one chooses.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172748)
Time Spent: 1h 10m  (was: 1h)

> Dataflow portable runner harness should use ExecutableStage to process bundle
> -
>
> Key: BEAM-6159
> URL: https://issues.apache.org/jira/browse/BEAM-6159
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172747
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r23962
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java
 ##
 @@ -174,8 +174,8 @@ public void testCreateMapTaskExecutor() throws Exception {
 mapTaskExecutorFactory.create(
 null /* beamFnControlClientHandler */,
 null /* beamFnDataService */,
-null, /* dataApiServiceDescriptor */
 null /* beamFnStateService */,
+null,
 
 Review comment:
   nit: in a follow up PR add the comment stating what the `null` represents.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172747)
Time Spent: 1h 10m  (was: 1h)

> Dataflow portable runner harness should use ExecutableStage to process bundle
> -
>
> Key: BEAM-6159
> URL: https://issues.apache.org/jira/browse/BEAM-6159
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172754
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #7015: [BEAM-6159] Migrate 
dataflow portable worker using shared library to process bundle 
URL: https://github.com/apache/beam/pull/7015
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
index 3feb7064b895..1d1c56cb80f3 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
@@ -197,7 +197,7 @@ public void execute() throws Exception {
   EnvironmentFactory environmentFactory =
   createEnvironmentFactory(control, logging, artifact, provisioning, 
controlClientPool);
   JobBundleFactory jobBundleFactory =
-  SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, 
data, state);
+  SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, 
data, state, null);
 
   TransformEvaluatorRegistry transformRegistry =
   TransformEvaluatorRegistry.portableRegistry(
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java
index f0f877283154..2dfaac966827 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java
@@ -111,7 +111,7 @@ public void setup() throws Exception {
 bundleFactory = ImmutableListBundleFactory.create();
 JobBundleFactory jobBundleFactory =
 SingleEnvironmentInstanceJobBundleFactory.create(
-environmentFactory, dataServer, stateServer);
+environmentFactory, dataServer, stateServer, null);
 factory = new RemoteStageEvaluatorFactory(bundleFactory, jobBundleFactory);
   }
 
diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
index 1433702f714d..f0e8ddb73456 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
@@ -37,6 +37,7 @@
 import 
org.apache.beam.runners.dataflow.worker.apiary.FixMultiOutputInfosOnParDoInstructions;
 import org.apache.beam.runners.dataflow.worker.counters.CounterSet;
 import 
org.apache.beam.runners.dataflow.worker.graph.CloneAmbiguousFlattensFunction;
+import 
org.apache.beam.runners.dataflow.worker.graph.CreateExecutableStageNodeFunction;
 import 
org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction;
 import 
org.apache.beam.runners.dataflow.worker.graph.DeduceFlattenLocationsFunction;
 import 
org.apache.beam.runners.dataflow.worker.graph.DeduceNodeLocationsFunction;
@@ -218,18 +219,32 @@ protected BatchDataflowWorker(
 // TODO: this conditional -> two implementations of common interface, or
 // param/injection
 if (DataflowRunner.hasExperiment(options, "beam_fn_api")) {
-  Function, Node> sdkFusedStage =
-  pipeline == null
-  ? RegisterNodeFunction.withoutPipeline(
-  idGenerator, 
sdkHarnessRegistry.beamFnStateApiServiceDescriptor())
-  : RegisterNodeFunction.forPipeline(
-  pipeline, idGenerator, 
sdkHarnessRegistry.beamFnStateApiServiceDescriptor());
+  Function, MutableNetwork> 
transformToRunnerNetwork;
+  Function, Node> sdkFusedStage;
   Function, MutableNetwork> 
lengthPrefixUnknownCoders =
   LengthPrefixUnknownCoders::forSdkNetwork;
-  Function, MutableNetwork> 
transformToRunnerNetwork =
-  new CreateRegisterFnOperationFunction(
-  idGenerator, this::createPortNode, 

[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172752
 ]

ASF GitHub Bot logged work on BEAM-6159:


Author: ASF GitHub Bot
Created on: 06/Dec/18 18:13
Start Date: 06/Dec/18 18:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7015: 
[BEAM-6159] Migrate dataflow portable worker using shared library to process 
bundle 
URL: https://github.com/apache/beam/pull/7015#discussion_r239556452
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironment.java
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import net.jcip.annotations.ThreadSafe;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+
+/** A {@link RemoteEnvironment} that connects to Dataflow runner harness. */
+@ThreadSafe
+public class StaticRemoteEnvironment implements RemoteEnvironment {
+
+  static StaticRemoteEnvironment create(
+  Environment environment, InstructionRequestHandler 
instructionRequestHandler) {
+return new StaticRemoteEnvironment(environment, instructionRequestHandler);
+  }
+
+  private final Object lock = new Object();
+  private final Environment environment;
+  private final InstructionRequestHandler instructionRequestHandler;
+
+  private boolean isClosed = false;
+
+  private StaticRemoteEnvironment(
+  Environment environment, InstructionRequestHandler 
instructionRequestHandler) {
+this.environment = environment;
+this.instructionRequestHandler = instructionRequestHandler;
+  }
+
+  @Override
+  public Environment getEnvironment() {
+return this.environment;
+  }
+
+  @Override
+  public InstructionRequestHandler getInstructionRequestHandler() {
+return this.instructionRequestHandler;
+  }
+
+  @Override
+  public void close() throws Exception {
 
 Review comment:
   nit: in a follow up PR, instead of using a lock object, make this method 
synchronized. Also prefer guard style statements:
   ```
   if (isClosed) {
 return;
   }
   ... do other stuff to perform close ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172752)

> Dataflow portable runner harness should use ExecutableStage to process bundle
> -
>
> Key: BEAM-6159
> URL: https://issues.apache.org/jira/browse/BEAM-6159
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6189) Deprecate and cleanup BeamFnApi.Metrics from Dataflow Worker

2018-12-06 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-6189:
---

 Summary: Deprecate and cleanup BeamFnApi.Metrics from Dataflow 
Worker
 Key: BEAM-6189
 URL: https://issues.apache.org/jira/browse/BEAM-6189
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution
Reporter: Mikhail Gryzykhin


Current Dataflow Worker code contains multiple references to Metrics proto.

Some of these references include fallback scenario for cases where migration to 
MonitoringInfo did not happen yet.

Once all SDKs migrated to new approach for reporting monitoring information, we 
need to cleanup Worker code and completely deprecate Metrics proto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172714
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:24
Start Date: 06/Dec/18 17:24
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239541582
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
+// refer to https://github.com/apache/beam/pull/6799
+
+if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
+  if (!type.equals("beam:metrics:sum_int_64")) {
+LOG.warn(
 
 Review comment:
   I believe that writing warning is good balance here. Most of the time you 
want your job to process data, and use counters for monitoring.
   This approach allows SDK authors to find errors in their SDK implementations 
on one hand. And not break user pipelines on the other hand.
   
   @ajamato Alex, what's your opinion on this topic?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172714)
Time Spent: 5h 10m  (was: 5h)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172708
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:20
Start Date: 06/Dec/18 17:20
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239540159
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
 
 Review comment:
   This depends on https://github.com/apache/beam/pull/6799
   Will be implemented in further PRs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172708)
Time Spent: 4h 50m  (was: 4h 40m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172710
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:20
Start Date: 06/Dec/18 17:20
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239538937
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
 
 Review comment:
   This requires PR mentioned on next line to go in. 
(https://github.com/apache/beam/pull/6799)
   Technically, I can copy data from another PR, but format of counters is 
still being defined. It will cause more merging and resolving in both PRs as 
well.
   I suggest we keep it as is for this PR and generalize/fix it in following PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172710)
Time Spent: 5h  (was: 4h 50m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172706
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:16
Start Date: 06/Dec/18 17:16
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239538937
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
 
 Review comment:
   This requires PR mentioned below to go in.
   Technically, I can copy data from another PR, but format of counters is 
still being defined. It will cause more merging and resolving in both PRs.
   I suggest we keep it as is for this PR and generalize/fix it in following PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172706)
Time Spent: 4h 40m  (was: 4.5h)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172701
 ]

ASF GitHub Bot logged work on BEAM-6067:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:08
Start Date: 06/Dec/18 17:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #7081: [BEAM-6067] In 
Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard 
CloudObject coders
URL: https://github.com/apache/beam/pull/7081#issuecomment-444950367
 
 
   Tests look good. Done. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172701)
Time Spent: 7.5h  (was: 7h 20m)
Remaining Estimate: 160.5h  (was: 160h 40m)

> Dataflow runner should include portable pipeline coder id in CloudObject 
> coder representation
> -
>
> Key: BEAM-6067
> URL: https://issues.apache.org/jira/browse/BEAM-6067
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Craig Chambers
>Assignee: Craig Chambers
>Priority: Major
>   Original Estimate: 168h
>  Time Spent: 7.5h
>  Remaining Estimate: 160.5h
>
> When translating a BeamJava Coder into the DataflowRunner's CloudObject 
> property map, include a property that specifies the id in the Beam model 
> Pipeline coders map corresponding to that Coder.  This will allow the 
> DataflowRunner to reference the corresponding Beam coder in the FnAPI 
> processing bundle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172700
 ]

ASF GitHub Bot logged work on BEAM-6067:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:08
Start Date: 06/Dec/18 17:08
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #7081: [BEAM-6067] In 
Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard 
CloudObject coders
URL: https://github.com/apache/beam/pull/7081
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/coders/coders.py 
b/sdks/python/apache_beam/coders/coders.py
index f5c90a8ec99e..f2a4b2ee724a 100644
--- a/sdks/python/apache_beam/coders/coders.py
+++ b/sdks/python/apache_beam/coders/coders.py
@@ -190,7 +190,7 @@ def _get_component_coders(self):
 # refined in user-defined Coders.
 return []
 
-  def as_cloud_object(self):
+  def as_cloud_object(self, coders_context=None):
 """For internal use only; no backwards-compatibility guarantees.
 
 Returns Google Cloud Dataflow API description of this coder."""
@@ -201,12 +201,17 @@ def as_cloud_object(self):
 # We pass coders in the form "$" to make the
 # job description JSON more readable.  Data before the $ is ignored by
 # the worker.
-'@type': serialize_coder(self),
-'component_encodings': list(
-component.as_cloud_object()
+'@type':
+serialize_coder(self),
+'component_encodings': [
+component.as_cloud_object(coders_context)
 for component in self._get_component_coders()
-),
+],
 }
+
+if coders_context:
+  value['pipeline_proto_coder_id'] = coders_context.get_id(self)
+
 return value
 
   def __repr__(self):
@@ -370,7 +375,7 @@ def _create_impl(self):
   def is_deterministic(self):
 return True
 
-  def as_cloud_object(self):
+  def as_cloud_object(self, coders_context=None):
 return {
 '@type': 'kind:bytes',
 }
@@ -394,7 +399,7 @@ def _create_impl(self):
   def is_deterministic(self):
 return True
 
-  def as_cloud_object(self):
+  def as_cloud_object(self, coders_context=None):
 return {
 '@type': 'kind:varint',
 }
@@ -516,8 +521,8 @@ def is_deterministic(self):
 # GroupByKey operations.
 return False
 
-  def as_cloud_object(self, is_pair_like=True):
-value = super(_PickleCoderBase, self).as_cloud_object()
+  def as_cloud_object(self, coders_context=None, is_pair_like=True):
+value = super(_PickleCoderBase, self).as_cloud_object(coders_context)
 # We currently use this coder in places where we cannot infer the coder to
 # use for the value type in a more granular way.  In places where the
 # service expects a pair, it checks for the "is_pair_like" key, in which
@@ -525,8 +530,8 @@ def as_cloud_object(self, is_pair_like=True):
 if is_pair_like:
   value['is_pair_like'] = True
   value['component_encodings'] = [
-  self.as_cloud_object(is_pair_like=False),
-  self.as_cloud_object(is_pair_like=False)
+  self.as_cloud_object(coders_context, is_pair_like=False),
+  self.as_cloud_object(coders_context, is_pair_like=False)
   ]
 
 return value
@@ -615,8 +620,8 @@ def as_deterministic_coder(self, step_label, 
error_message=None):
 else:
   return DeterministicFastPrimitivesCoder(self, step_label)
 
-  def as_cloud_object(self, is_pair_like=True):
-value = super(FastCoder, self).as_cloud_object()
+  def as_cloud_object(self, coders_context=None, is_pair_like=True):
+value = super(FastCoder, self).as_cloud_object(coders_context)
 # We currently use this coder in places where we cannot infer the coder to
 # use for the value type in a more granular way.  In places where the
 # service expects a pair, it checks for the "is_pair_like" key, in which
@@ -624,8 +629,8 @@ def as_cloud_object(self, is_pair_like=True):
 if is_pair_like:
   value['is_pair_like'] = True
   value['component_encodings'] = [
-  self.as_cloud_object(is_pair_like=False),
-  self.as_cloud_object(is_pair_like=False)
+  self.as_cloud_object(coders_context, is_pair_like=False),
+  self.as_cloud_object(coders_context, is_pair_like=False)
   ]
 
 return value
@@ -744,18 +749,20 @@ def as_deterministic_coder(self, step_label, 
error_message=None):
   def from_type_hint(typehint, registry):
 return TupleCoder([registry.get_coder(t) for t in typehint.tuple_types])
 
-  def as_cloud_object(self):
+  def as_cloud_object(self, coders_context=None):
 

[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172698
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:02
Start Date: 06/Dec/18 17:02
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on issue #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#issuecomment-444948074
 
 
   *sigh* it was clean 2 days ago. I will resolve the conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172698)
Time Spent: 40m  (was: 0.5h)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172696
 ]

ASF GitHub Bot logged work on BEAM-6067:


Author: ASF GitHub Bot
Created on: 06/Dec/18 17:00
Start Date: 06/Dec/18 17:00
Worklog Time Spent: 10m 
  Work Description: CraigChambersG commented on issue #7081: [BEAM-6067] In 
Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard 
CloudObject coders
URL: https://github.com/apache/beam/pull/7081#issuecomment-444947320
 
 
   Is there something I need to do at this point to get this PR checked in?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172696)
Time Spent: 7h 10m  (was: 7h)
Remaining Estimate: 160h 50m  (was: 161h)

> Dataflow runner should include portable pipeline coder id in CloudObject 
> coder representation
> -
>
> Key: BEAM-6067
> URL: https://issues.apache.org/jira/browse/BEAM-6067
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Craig Chambers
>Assignee: Craig Chambers
>Priority: Major
>   Original Estimate: 168h
>  Time Spent: 7h 10m
>  Remaining Estimate: 160h 50m
>
> When translating a BeamJava Coder into the DataflowRunner's CloudObject 
> property map, include a property that specifies the id in the Beam model 
> Pipeline coders map corresponding to that Coder.  This will allow the 
> DataflowRunner to reference the corresponding Beam coder in the FnAPI 
> processing bundle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172695
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:59
Start Date: 06/Dec/18 16:59
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7197: [BEAM-6178] Adding 
beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature
URL: https://github.com/apache/beam/pull/7197#issuecomment-444947221
 
 
   FYI, there are some merge conflicts. I will get started on the review, but 
before we can merge please resolve conflicts by either merge in master or 
rebase your branch on top. (`git fetch origin && git rebase origin/master`)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172695)
Time Spent: 0.5h  (was: 20m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172686
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239525848
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
 
 Review comment:
   Should this be fixed before merging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172686)
Time Spent: 4h 10m  (was: 4h)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172690
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239524179
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -307,10 +344,17 @@ void updateProgress() {
   grpcWriteOperation.abortWait();
 }
 
-BeamFnApi.Metrics metrics = 
MoreFutures.get(bundleProcessOperation.getMetrics());
+//TODO: Replace getProcessBundleProgress with getMonitoringInfos when 
Metrics is deprecated.
 
 Review comment:
   +1 to tag a JIRA. It's useful to have some additional context for when the 
TODO inevitably gets stale and needs clarification to be actionable.
   
   The JIRA doesn't have to be explicitly for doing this cleanup. It should 
track when this TODO is ready to implement or obsolete. In this case, it would 
make sense to have a JIRA for deprecating the legacy Metrics.
   
   Google's public style guides [offer similar 
advice](https://google.github.io/styleguide/cppguide.html#TODO_Comments) (we 
don't follow Google styleguides in Beam, although the advice is useful):
   
   > Use TODO comments for code that is temporary, a short-term solution, or 
good-enough but not perfect.
   > 
   > TODOs should include the string TODO in all caps, followed by the name, 
e-mail address, bug ID, or other identifier of the person or issue with the 
best context about the problem referenced by the TODO. The main purpose is to 
have a consistent TODO that can be searched to find out how to get more details 
upon request. A TODO is not a commitment that the person referenced will fix 
the problem. Thus when you create a TODO with a name, it is almost always your 
name that is given.
   > 
   > `// TODO(k...@gmail.com): Use a "*" here for concatenation operator.`
   > `// TODO(Zeke) change this to use relations.`
   > `// TODO(bug 12345): remove the "Last visitors" feature`
   > 
   > If your TODO is of the form "At a future date do something" make sure that 
you either include a very specific date ("Fix by November 2005") or a very 
specific event ("Remove this code when all clients can handle XML responses.").


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172690)
Time Spent: 4.5h  (was: 4h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172691
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239518531
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
 ##
 @@ -263,20 +263,26 @@ private synchronized WorkItemStatus 
createStatusUpdate(boolean isFinal) {
 return status;
   }
 
+  // todomigryz this method should return List instead of 
updating member variable
 
 Review comment:
   `todomigryz` -> `TODO(migryz)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172691)
Time Spent: 4.5h  (was: 4h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172688
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239524339
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -307,10 +347,17 @@ void updateProgress() {
   grpcWriteOperation.abortWait();
 }
 
-BeamFnApi.Metrics metrics = 
MoreFutures.get(bundleProcessOperation.getMetrics());
+//TODO: Replace getProcessBundleProgress with getMonitoringInfos when 
Metrics is deprecated.
+ProcessBundleProgressResponse processBundleProgressResponse =
+MoreFutures.get(bundleProcessOperation.getProcessBundleProgress());
+updateMetrics(processBundleProgressResponse.getMonitoringInfosList());
 
-updateMetrics(metrics);
+// Supporting deprecated metrics until all supported runners are 
migrated to using
+// MonitoringInfos
+Metrics metrics = processBundleProgressResponse.getMetrics();
+updateMetricsDeprecated(metrics);
 
+// todomigryz: utilize monitoringInfos here.
 
 Review comment:
   `todomigryz` -> `TODO(migryz)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172688)
Time Spent: 4.5h  (was: 4h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172685
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239524498
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
 
 Review comment:
   `todomigryz` -> `TODO(migryz)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172685)
Time Spent: 4h  (was: 3h 50m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172687
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239527436
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
+// refer to https://github.com/apache/beam/pull/6799
+
+if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) {
+  if (!type.equals("beam:metrics:sum_int_64")) {
+LOG.warn(
 
 Review comment:
   Are there scenarios where we expect invalid metric types which should be 
ignored? If so, please comment when this might be the case.
   
   If a correct implementation should always send valid data, it's best to fail 
rather than be resilient such that it's easier to find and fix bugs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172687)
Time Spent: 4h 20m  (was: 4h 10m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172689
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239525981
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
+// when more counter types are added.
+// todomigryz: define counter transformer factory
+// that can provide respective counter transformer for different type of 
counters.
+// (ie RowCountCounterTranformer, MSecCounterTransformer, 
UserCounterTransformer, etc)
+private static class MonitoringInfoToCounterUpdateTransformer {
+
+  private final Map transformIdMapping;
+
+  public MonitoringInfoToCounterUpdateTransformer(
+  final Map transformIdMapping) {
+this.transformIdMapping = transformIdMapping;
+  }
+
+  // todo: search code for "beam:metrics"... and replace them with 
relevant enums from
+  // proto after rebasing above https://github.com/apache/beam/pull/6799 
that
+  // introduces relevant proto entries.
+  final String BEAM_METRICS_USER_PREFIX = "beam:metric:user";
+
+  private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo 
monitoringInfo) {
+long value = 
monitoringInfo.getMetric().getCounterData().getInt64Value();
+String urn = monitoringInfo.getUrn();
+
+String type = monitoringInfo.getType();
+
+// todomigryz: run MonitoringInfo through validation process.
 
 Review comment:
   Should this be fixed before merging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172689)
Time Spent: 4.5h  (was: 4h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172684
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239520301
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -139,30 +150,44 @@ private AutoCloseable 
progressTrackerCloseable(ProgressTracker progressTracker)
*/
   @Override
   public Iterable extractMetricUpdates() {
+List result = progressTracker.extractCounterUpdates();
+if ((result != null) && (result.size() > 0)) {
+  return result;
+}
+
+// BeamFnApi.Metrics was deprecated and replaced with more flexible 
MonitoringInfo.
+// However some of SDKs have implementations that utilize 
BeamFnApi.Metrics and were not yet
+// migrated to using new approach.
+// Falling back to using deprecated approach until all officially 
supported SDKs complete migration.
 
 Review comment:
   Is there a JIRA that tracks this migration?
   
   More generally: is it possible to make this comment more actionable? 
Something of the form: `once this condition is true (all SDKs are migrated), do 
this action (remove this fallback)`. It should be straightforward for anybody 
to check on the migration status and then delete the code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172684)
Time Spent: 3h 50m  (was: 3h 40m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172692
 ]

ASF GitHub Bot logged work on BEAM-6181:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:51
Start Date: 06/Dec/18 16:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7202: 
[BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow 
Runner.
URL: https://github.com/apache/beam/pull/7202#discussion_r239525644
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java
 ##
 @@ -351,7 +398,115 @@ void updateProgress() {
   }
 }
 
-private void updateMetrics(BeamFnApi.Metrics metrics) {
+// Keeping this as static class for this iteration. Will extract to 
separate file and generalize
 
 Review comment:
   This first sentence comment seems to refer specifically to the PR. I suggest 
removing it as the rest of the comment is sufficiently descriptive.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172692)
Time Spent: 4.5h  (was: 4h 20m)

> Utilize MetricInfo for reporting user metrics in Portable Dataflow Java 
> Runner.
> ---
>
> Key: BEAM-6181
> URL: https://issues.apache.org/jira/browse/BEAM-6181
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> New approach to report metrics in FnApi is to utilize MetricInfo structures.
> This approach is implemented in Python SDK and work is ongoing in Java SDK.
> This tasks includes plumbing User metrics reported via MetricInfos through 
> Dataflow Java Runner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6188) Create unbounded synthetic source

2018-12-06 Thread Lukasz Gajowy (JIRA)
Lukasz Gajowy created BEAM-6188:
---

 Summary: Create unbounded synthetic source
 Key: BEAM-6188
 URL: https://issues.apache.org/jira/browse/BEAM-6188
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Lukasz Gajowy
Assignee: Lukasz Gajowy


It is needed for streaming scenarios. It should provide ways to reason about 
time and recovering from checkpoints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6178) Create BOM for Beam

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172674
 ]

ASF GitHub Bot logged work on BEAM-6178:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:22
Start Date: 06/Dec/18 16:22
Worklog Time Spent: 10m 
  Work Description: garrettjonesgoogle commented on issue #7197: 
[BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for 
applyJavaNature
URL: https://github.com/apache/beam/pull/7197#issuecomment-444931932
 
 
   R: @swegner 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172674)
Time Spent: 20m  (was: 10m)

> Create BOM for Beam
> ---
>
> Key: BEAM-6178
> URL: https://issues.apache.org/jira/browse/BEAM-6178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.10.0
>Reporter: Garrett Jones
>Assignee: Garrett Jones
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add a module to Beam which generates a BOM for Beam modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172675
 ]

ASF GitHub Bot logged work on BEAM-5321:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:23
Start Date: 06/Dec/18 16:23
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on issue #7104: [BEAM-5321] 
Port transforms package to Python 3
URL: https://github.com/apache/beam/pull/7104#issuecomment-444932183
 
 
   PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172675)
Time Spent: 2h 40m  (was: 2.5h)

> Finish Python 3 porting for transforms module
> -
>
> Key: BEAM-5321
> URL: https://issues.apache.org/jira/browse/BEAM-5321
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts

2018-12-06 Thread Maximilian Michels (JIRA)
Maximilian Michels created BEAM-6187:


 Summary: Drop Scala suffix of FlinkRunner artifacts
 Key: BEAM-6187
 URL: https://issues.apache.org/jira/browse/BEAM-6187
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 2.10.0


With BEAM-5419 we will build multiple versions of the Flink Runner against 
different Flink versions. The new artifacts will lead to confusing names like 
{{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix and 
just build against the most stable Flink Scala version. 

Projects like Scio have the option to cross-compile to different Scala versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts

2018-12-06 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711654#comment-16711654
 ] 

Maximilian Michels commented on BEAM-6187:
--

What do you think [~thw]?

> Drop Scala suffix of FlinkRunner artifacts
> --
>
> Key: BEAM-6187
> URL: https://issues.apache.org/jira/browse/BEAM-6187
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>
> With BEAM-5419 we will build multiple versions of the Flink Runner against 
> different Flink versions. The new artifacts will lead to confusing names like 
> {{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix 
> and just build against the most stable Flink Scala version. 
> Projects like Scio have the option to cross-compile to different Scala 
> versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6185) Upgrade to Spark 2.4.0

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=172673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172673
 ]

ASF GitHub Bot logged work on BEAM-6185:


Author: ASF GitHub Bot
Created on: 06/Dec/18 16:18
Start Date: 06/Dec/18 16:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #7216: [BEAM-6185] Upgrade 
to Spark 2.4.0
URL: https://github.com/apache/beam/pull/7216#issuecomment-444930062
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172673)
Time Spent: 20m  (was: 10m)

> Upgrade to Spark 2.4.0
> --
>
> Key: BEAM-6185
> URL: https://issues.apache.org/jira/browse/BEAM-6185
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5410) Fail SpannerIO early for unsupported streaming mode

2018-12-06 Thread Niel Markwick (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niel Markwick resolved BEAM-5410.
-
   Resolution: Fixed
Fix Version/s: 2.9.0

Fixed by BEAM-4796 in 2.9.0

> Fail SpannerIO early for unsupported streaming mode
> ---
>
> Key: BEAM-5410
> URL: https://issues.apache.org/jira/browse/BEAM-5410
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.8.0
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
> Fix For: 2.9.0
>
>
> Currently SpannerIO does not support streaming mode. We should fail with a 
> clear error till this is fixed and also update documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6186) Cleanup FnApiRunner optimization phases.

2018-12-06 Thread Robert Bradshaw (JIRA)
Robert Bradshaw created BEAM-6186:
-

 Summary: Cleanup FnApiRunner optimization phases.
 Key: BEAM-6186
 URL: https://issues.apache.org/jira/browse/BEAM-6186
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Robert Bradshaw
Assignee: Ahmet Altay


They are currently expressed as functions with closure. It would be good to 
pull them out with explicit dependencies both to better be able to follow the 
code, and also be able to test and reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172650
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 06/Dec/18 14:17
Start Date: 06/Dec/18 14:17
Worklog Time Spent: 10m 
  Work Description: mxm closed pull request #7208: [BEAM-5859] Better 
handle fused composite stage names.
URL: https://github.com/apache/beam/pull/7208
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
index 09d1a991c279..4fe4a0f3669e 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
@@ -19,7 +19,15 @@
 
 import static com.google.common.base.Preconditions.checkArgument;
 
+import com.google.common.base.Joiner;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.LinkedHashMultimap;
+import com.google.common.collect.Multimap;
 import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Map;
+import java.util.stream.Collectors;
 import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload;
 import org.apache.beam.runners.core.construction.graph.ExecutableStage;
@@ -45,19 +53,80 @@ public static String 
generateNameFromStagePayload(ExecutableStagePayload stagePa
 RunnerApi.Components components = stagePayload.getComponents();
 final int transformsCount = stagePayload.getTransformsCount();
 sb.append("[").append(transformsCount).append("]");
-sb.append("{");
+Collection names = new ArrayList<>();
 for (int i = 0; i < transformsCount; i++) {
   String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
-  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
-  name = name.replaceFirst("^ref_AppliedPTransform_", "");
   // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
   name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
-  sb.append(name);
-  if (i + 1 < transformsCount) {
-sb.append(", ");
-  }
+  names.add(name);
 }
-sb.append("}");
+sb.append(generateNameFromTransformNames(names, true));
 return sb.toString();
   }
+
+  /**
+   * Creates a human-readable name for a set of stage names that occur in a 
single stage.
+   *
+   * This name reflects the nested structure of the stages, as inferred by 
slashes in the stage
+   * names. Sibling stages will be listed as {A, B}, nested stages as A/B, and 
according to the
+   * value of truncateSiblingComposites the nesting stops at the first level 
that siblings are
+   * encountered.
+   *
+   * This is best understood via examples, of which there are several in 
the tests for this
+   * class.
+   *
+   * @param names a list of full stage names in this fused operation
+   * @param truncateSiblingComposites whether to recursively descent into 
composite operations that
+   * have simblings, or stop the recursion at that level.
+   * @return a single string representation of all the stages in this fused 
operation
+   */
+  public static String generateNameFromTransformNames(
+  Collection names, boolean truncateSiblingComposites) {
+Multimap groupByOuter = LinkedHashMultimap.create();
+for (String name : names) {
+  int index = name.indexOf('/');
+  if (index == -1) {
+groupByOuter.put(name, "");
+  } else {
+groupByOuter.put(name.substring(0, index), name.substring(index + 1));
+  }
+}
+if (groupByOuter.keySet().size() == 1) {
+  Map.Entry> outer =
+  Iterables.getOnlyElement(groupByOuter.asMap().entrySet());
+  if (outer.getValue().size() == 1 && outer.getValue().contains("")) {
+// Names consisted of a single name without any slashes.
+return outer.getKey();
+  } else {
+// Everything is in the same outer stage, enumerate at one level down.
+return String.format(
+"%s/%s",
+outer.getKey(),
+generateNameFromTransformNames(outer.getValue(), 
truncateSiblingComposites));
+  }
+} else {
+  Collection parts;
+ 

[jira] [Work logged] (BEAM-6167) Create a Class to read content of a file keeping track of the file path (python)

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6167?focusedWorklogId=172646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172646
 ]

ASF GitHub Bot logged work on BEAM-6167:


Author: ASF GitHub Bot
Created on: 06/Dec/18 14:02
Start Date: 06/Dec/18 14:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #7193: [BEAM-6167] Add 
class ReadFromTextWithFilename (Python)
URL: https://github.com/apache/beam/pull/7193#issuecomment-444880407
 
 
   Thanks, looks good modulo some precommit issues:
   
   ```
   $ tox -e docs,py27-lint
   ...
   12:09:32 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/io/textio.py:docstring
 of apache_beam.io.textio.ReadFromTextWithFilename:1: WARNING: py:class 
reference target not found: apache_beam.io.ReadFromText
   ...
   12:10:32 C: 41, 0: Line too long (88/80) (line-too-long)
   12:10:32 C:325, 0: Line too long (94/80) (line-too-long)
   12:10:32 C:543, 0: Line too long (88/80) (line-too-long)
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172646)
Time Spent: 1h 20m  (was: 1h 10m)

> Create a Class to read content of a file keeping track of the file path 
> (python)
> 
>
> Key: BEAM-6167
> URL: https://issues.apache.org/jira/browse/BEAM-6167
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.8.0
>Reporter: Lorenzo Caggioni
>Assignee: Eugene Kirpichov
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add a class to read content of a file keeping track of the file path each 
> element come from.
> This is an improvement of the current python/apache_beam/io/textio.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3912) Add batching support for HadoopOutputFormatIO

2018-12-06 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-3912.

   Resolution: Fixed
Fix Version/s: 2.10.0

> Add batching support for HadoopOutputFormatIO
> -
>
> Key: BEAM-3912
> URL: https://issues.apache.org/jira/browse/BEAM-3912
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-1533) Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO

2018-12-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-1533.

   Resolution: Fixed
Fix Version/s: Not applicable

> Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO
> ---
>
> Key: BEAM-1533
> URL: https://issues.apache.org/jira/browse/BEAM-1533
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: newbie, starter
> Fix For: Not applicable
>
>
> Current implementation uses classic java's byte array manipulation, this is 
> error-prone, it could benefit of using the existing ByteKey, ByteKeyRange,  
> ByteKeyRangeTracker classes from the SDK. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1533) Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO

2018-12-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711467#comment-16711467
 ] 

Ismaël Mejía commented on BEAM-1533:


Yes you are right, split is now based on ByteKey, closing it for now, thanks.

> Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO
> ---
>
> Key: BEAM-1533
> URL: https://issues.apache.org/jira/browse/BEAM-1533
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: newbie, starter
> Fix For: Not applicable
>
>
> Current implementation uses classic java's byte array manipulation, this is 
> error-prone, it could benefit of using the existing ByteKey, ByteKeyRange,  
> ByteKeyRangeTracker classes from the SDK. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5817) Nexmark test of joining stream to files

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5817?focusedWorklogId=172639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172639
 ]

ASF GitHub Bot logged work on BEAM-5817:


Author: ASF GitHub Bot
Created on: 06/Dec/18 13:35
Start Date: 06/Dec/18 13:35
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #7205: [BEAM-5817] Add SQL 
bounded side input join to queries that are actually run
URL: https://github.com/apache/beam/pull/7205#issuecomment-444872395
 
 
   @kennknowles sorry I did not have time to take a look before merge, I'm a 
git busy en spark runner right now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172639)
Time Spent: 8h 20m  (was: 8h 10m)

> Nexmark test of joining stream to files
> ---
>
> Key: BEAM-5817
> URL: https://issues.apache.org/jira/browse/BEAM-5817
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Nexmark is a convenient framework for testing the use case of large scale 
> stream enrichment. One way is joining a stream to files, and it can be tested 
> via any source that Nexmark supports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2018-12-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=172644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172644
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 06/Dec/18 13:42
Start Date: 06/Dec/18 13:42
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #7213: [BEAM-6184] Add 
portable-runner dependency to example pom.xml
URL: https://github.com/apache/beam/pull/7213
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
index dcd9748311a9..5f1ba8b3490e 100644
--- 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
+++ 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
@@ -167,6 +167,22 @@
   
 
 
+   
+  portable-runner
+  
+true
+  
+  
+  
+
+  org.apache.beam
+  beam-runners-reference-java
+  ${beam.version}
+  runtime
+
+  
+
+
 
   apex-runner
   


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 172644)
Time Spent: 0.5h  (was: 20m)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >