[GitHub] [beam] suztomo edited a comment on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
suztomo edited a comment on issue #11168: [BEAM-9542] Limit and clarify the 
effect of "force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601987472
 
 
   Because the effect of `force` is only for tests. The compileJava 
configuration still fetches the latest google-api-client, which is 1.30.8, 
which has a bug of android annotation dependency.
   
   ```
   17:21:41 Execution failed for task 
':sdks:java:testing:test-utils:compileJava'.
   17:21:41 > Could not resolve all files for configuration 
':sdks:java:testing:test-utils:compileClasspath'.
   17:21:41> Could not find androidx.annotation:annotation:1.1.0.
   17:21:41  Searched in the following locations:
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41  Required by:
   17:21:41  project :sdks:java:testing:test-utils > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.api-client:google-api-client:1.30.8
   ```
   
   Maybe I'll need to update google-api-client.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] suztomo edited a comment on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
suztomo edited a comment on issue #11168: [BEAM-9542] Limit and clarify the 
effect of "force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601987472
 
 
   Because the effect of `force` is only for tests. The compileJava 
configuration still fetches the latest google-api-client, which is 1.30.8, 
which has a bug of android annotation dependency.
   
   ```
   17:21:41 Execution failed for task 
':sdks:java:testing:test-utils:compileJava'.
   17:21:41 > Could not resolve all files for configuration 
':sdks:java:testing:test-utils:compileClasspath'.
   17:21:41> Could not find androidx.annotation:annotation:1.1.0.
   17:21:41  Searched in the following locations:
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41  Required by:
   17:21:41  project :sdks:java:testing:test-utils > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.api-client:google-api-client:1.30.8
   ```
   
   Maybe I'll need to upgrade google-api-client to the latest version that has 
the fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] suztomo commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
suztomo commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601987472
 
 
   Because the effect of `force` is only for tests. The compileJava 
configuration still fetches the latest google-cloud-api, which is 1.30.8, which 
has a bug of android annotation dependency.
   
   ```
   17:21:41 Execution failed for task 
':sdks:java:testing:test-utils:compileJava'.
   17:21:41 > Could not resolve all files for configuration 
':sdks:java:testing:test-utils:compileClasspath'.
   17:21:41> Could not find androidx.annotation:annotation:1.1.0.
   17:21:41  Searched in the following locations:
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_PR/src/sdks/java/testing/test-utils/offline-repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repo.maven.apache.org/maven2/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
file:/home/jenkins/.m2/repository/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://jcenter.bintray.com/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://oss.sonatype.org/content/repositories/staging/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/snapshots/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.pom
   17:21:41- 
https://repository.apache.org/content/repositories/releases/androidx/annotation/annotation/1.1.0/annotation-1.1.0.jar
   17:21:41  Required by:
   17:21:41  project :sdks:java:testing:test-utils > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.api-client:google-api-client:1.30.8
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test

2020-03-20 Thread GitBox
udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is 
missing a test
URL: https://github.com/apache/beam/pull/10914#issuecomment-601976783
 
 
   will merge once I get postcommit to run this test and verify it passes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test

2020-03-20 Thread GitBox
udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is 
missing a test
URL: https://github.com/apache/beam/pull/10914#issuecomment-601976548
 
 
   run python 3.7 postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] jaketf commented on a change in pull request #11151: [BEAM-9468] Hl7v2 io

2020-03-20 Thread GitBox
jaketf commented on a change in pull request #11151: [BEAM-9468]  Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#discussion_r395946114
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
 ##
 @@ -0,0 +1,620 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.healthcare;
+
+import com.google.api.services.healthcare.v1alpha2.model.IngestMessageResponse;
+import com.google.api.services.healthcare.v1alpha2.model.Message;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.text.ParseException;
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.sdk.io.gcp.datastore.AdaptiveThrottler;
+import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link HL7v2IO} provides an API for reading from and writing to https://cloud.google.com/healthcare/docs/concepts/hl7v2;>Google Cloud 
Healthcare HL7v2 API.
+ * 
+ *
+ * Read HL7v2 Messages are fetched from the HL7v2 store based on the {@link 
PCollection} of of
+ * message IDs {@link String}s produced by the {@link 
AutoValue_HL7v2IO_Read#getMessageIDTransform}
+ * as {@link PCollectionTuple}*** containing an {@link HL7v2IO.Read#OUT} tag 
for successfully
+ * fetched messages and a {@link HL7v2IO.Read#DEAD_LETTER} tag for message IDs 
that could not be
+ * fetched.
+ *
+ * HL7v2 stores can be read in several ways: - Unbounded: based on the 
Pub/Sub Notification
+ * Channel {@link HL7v2IO#readNotificationSubscription(String)} - Bounded: 
based on reading an
+ * entire HL7v2 store (or stores) {@link HL7v2IO#readHL7v2Store(String)} - 
Bounded: based on reading
+ * an HL7v2 store with a filter
+ *
+ * Note, due to the flexibility of this Read transform, this must output a 
dead letter queue.
+ * This handles the scenario where the the PTransform that populates a 
PCollection of message IDs
+ * contains message IDs that do not exist in the HL7v2 stores.
+ *
+ * Example:
+ *
+ * {@code
+ * PipelineOptions options = ...;
+ * Pipeline pipeline = Pipeline.create(options)
+ *
+ *
+ * PCollectionTuple messages = pipeline.apply(
+ * new HLv2IO.readNotifications(options.getNotificationSubscription())
+ *
+ * // Write errors to your favorite dead letter  queue (e.g. Pub/Sub, GCS, 
BigQuery)
+ * messages.get(PubsubNotificationToHL7v2Message.DEAD_LETTER)
+ *.apply("WriteToDeadLetterQueue", ...);
+ *
+ * PCollection fetchedMessages = 
fetchResults.get(PubsubNotificationToHL7v2Message.OUT)
+ *.apply("ExtractFetchedMessage",
+ *MapElements
+ *.into(TypeDescriptor.of(Message.class))
+ *.via(FailsafeElement::getPayload));
+ *
+ * // Go about your happy path transformations.
+ * PCollection out = fetchedMessages.apply("ProcessFetchedMessages", 
...);
+ *
+ * // Write using the Message.Ingest method of the HL7v2 REST API.
+ * out.apply(HL7v2IO.ingestMessages(options.getOutputHL7v2Store()));
+ *
+ * pipeline.run();
+ *
+ * }***
+ * 
+ */
+public class HL7v2IO {
+  // TODO add metrics for failed records.
+
+  private static Read.Builder read(PTransform> 
messageIDTransform) {
 
 Review comment:
   refactored to implement both  `HL7v2IO.[Read,Write].Result`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[GitHub] [beam] robertwb commented on a change in pull request #11185: [BEAM-8019] Some generalizations to support cross-language transforms.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11185: [BEAM-8019] Some 
generalizations to support cross-language transforms.
URL: https://github.com/apache/beam/pull/11185#discussion_r395944033
 
 

 ##
 File path: sdks/python/apache_beam/coders/coders.py
 ##
 @@ -1383,22 +1377,67 @@ def from_runner_api_parameter(payload, components, 
context):
 write_state_threshold=int(payload))
 
 
-class RunnerAPICoderHolder(Coder):
-  """A `Coder` that holds a runner API `Coder` proto.
+class ElementTypeHolder(typehints.TypeConstraint):
+  """A dummy element type for external coders that cannot be parsed in 
Python"""
 
-  This is used for coders for which corresponding objects cannot be
-  initialized in Python SDK. For example, coders for remote SDKs that may
-  be available in Python SDK transform graph when expanding a cross-language
-  transform.
-  """
-  def __init__(self, proto):
-self._proto = proto
+  def __init__(self, coder, context):
+self.coder = coder
+self.context = context
 
-  def proto(self):
-return self._proto
 
-  def to_runner_api(self, context):
-return self._proto
+class ExternalCoder(Coder):
 
-  def to_type_hint(self):
-return Any
+  coder_count = 0
+
+  def __init__(self, element_type_holder):
+self.element_type_holder = element_type_holder
+
+  def as_cloud_object(self, coders_context=None):
+if not coders_context:
+  raise Exception(
+  'coders_context must be specified to correctly encode external 
coders')
+coder_id = coders_context.get_by_proto(
+self.element_type_holder.coder, deduplicate=True)
+
+coder_proto = self.element_type_holder.coder
+
+
+kind_str = 'kind:external' + str(ExternalCoder.coder_count)
+ExternalCoder.coder_count = ExternalCoder.coder_count + 1
+component_encodings = []
+if coder_proto.spec.urn == 'beam:coder:kv:v1':
+  kind_str = 'kind:pair'
+  for component_coder_id in coder_proto.component_coder_ids:
+component_encodings.append({
+'@type': 'kind:external' + str(ExternalCoder.coder_count),
+'pipeline_proto_coder_id': component_coder_id
+})
+ExternalCoder.coder_count = ExternalCoder.coder_count + 1
+
+value = {
+# This is a placeholder type. Dataflow will get the actual coder from
+# pipeline proto using the pipeline_proto_coder_id property.
+'@type': kind_str,
+'pipeline_proto_coder_id': coder_id
+}
+if component_encodings:
+  value['is_pair_like'] = True
+  value['component_encodings'] = component_encodings
+
+return value
+
+  @staticmethod
+  def from_type_hint(typehint, unused_registry):
+if isinstance(typehint, ElementTypeHolder):
+  return ExternalCoder(typehint)
+else:
+  raise ValueError((
+  'Expected an instance of ElementTypeHolder'
+  ', but got a %s' % typehint))
+
 
 Review comment:
   `yapf -ir path/to/files/...`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11185: [BEAM-8019] Some generalizations to support cross-language transforms.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11185: [BEAM-8019] Some 
generalizations to support cross-language transforms.
URL: https://github.com/apache/beam/pull/11185#discussion_r395944333
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -1127,30 +1133,79 @@ def transform_to_runner_api(transform,  # type: 
Optional[ptransform.PTransform]
   def from_runner_api(proto,  # type: beam_runner_api_pb2.PTransform
   context  # type: PipelineContext
  ):
+side_input_tags = []
+if common_urns.primitives.PAR_DO.urn == proto.spec.urn:
+  # Preserving side input tags.
+  from apache_beam.utils import proto_utils
+  from apache_beam.portability.api import beam_runner_api_pb2
+  payload = (
+  proto_utils.parse_Bytes(
+  proto.spec.payload, beam_runner_api_pb2.ParDoPayload))
+  for tag, si in payload.side_inputs.items():
+side_input_tags.append(tag)
+
 # type: (...) -> AppliedPTransform
-def is_side_input(tag):
-  # type: (str) -> bool
+def is_python_side_input(tag):
   # As per named_inputs() above.
-  return tag.startswith('side')
+  return re.match(SIDE_INPUT_REGEX, tag)
+
+all_input_tags = [tag for tag, id in proto.inputs.items()]
+
+# All side inputs have to be available in input tags
+python_indexed_side_inputs = False
+for side_tag in side_input_tags:
+  if side_tag not in all_input_tags:
+raise Exception(
+'Side input tag %s is not available in list of input tags %r' %
+(side_tag, all_input_tags))
+
+  # We process Python and external side inputs differently. We fail early
+  # here if we cannot decide which way to go.
+  if is_python_side_input(side_tag):
+python_indexed_side_inputs = True
+  else:
+if python_indexed_side_inputs:
+  raise Exception(
+  'Cannot process side inputs due to inconsistent sideinput '
+  'naming. If using an external transform consider re-naming side '
+  'inputs to not match Python indexed format %s' %
+  SIDE_INPUT_REGEX)
 
 main_inputs = [
 context.pcollections.get_by_id(id) for tag,
-id in proto.inputs.items() if not is_side_input(tag)
+id in proto.inputs.items() if tag not in side_input_tags
 ]
 
-# Ordering is important here.
-indexed_side_inputs = [
-(get_sideinput_index(tag), context.pcollections.get_by_id(id)) for tag,
-id in proto.inputs.items() if is_side_input(tag)
-]
-side_inputs = [si for _, si in sorted(indexed_side_inputs)]
+if python_indexed_side_inputs:
+  # Ordering is important here.
 
 Review comment:
   This all seems rather fragile. Would it be possible to just make side_inputs 
a dict everywhere in the internal SDK representation? (Or is there 
introspection with the legacy worker code that would make this hard?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ibzib commented on issue #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server.

2020-03-20 Thread GitBox
ibzib commented on issue #11189: [BEAM-9446] Retain unknown arguments when 
using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#issuecomment-601960942
 
 
   Run PortableJar_Flink PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395935001
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
 
 Review comment:
   Rather than manually write out the cross product, how about we define 
`{sum,min,max,top_n,bottom_n,distribtuion,latest}_{int64,double,string}` types 
as having known semantics and encoding? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395936249
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -52,38 +55,157 @@ message Annotation {
   string value = 2;
 }
 
-// Populated MonitoringInfoSpecs for specific URNs.
-// Indicating the required fields to be set.
-// SDKs and RunnerHarnesses can load these instances into memory and write a
-// validator or code generator to assist with populating and validating
-// MonitoringInfo protos.
+// A set of well known MonitoringInfo specifications.
 message MonitoringInfoSpecs {
   enum Enum {
-// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after
-// upgrading the python SDK.
-USER_COUNTER = 0 [(monitoring_info_spec) = {
-  urn: "beam:metric:user",
-  type_urn: "beam:metrics:sum_int_64",
+// Represents an integer counter where values are summed across bundles.
+USER_SUM_INT64 = 0 [(monitoring_info_spec) = {
+  urn: "beam:metric:user:v1",
+  type: "beam:metrics:sum_int64:v1",
   required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"],
   annotations: [{
 key: "description",
-value: "URN utilized to report user numeric counters."
+value: "URN utilized to report user metric."
   }]
 }];
 
-ELEMENT_COUNT = 1 [(monitoring_info_spec) = {
+// Represents a double counter where values are summed across bundles.
+USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = {
+  urn: "beam:metric:user:v1",
 
 Review comment:
   Should it be legal to have two counters with the same URN but different 
types. (This seems to fly agains the idea of a URN being a Unique identifier.) 
   
   Seeing this explosion of types, however, makes it feel like we should not be 
manually be enumerating them (or at least I'm struggling to see the value in 
that over just saying that user counters may have any type).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395933544
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,121 +330,148 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
-  // A set of key+value labels which define the scope of the metric.
+
+  // A set of key and value labels which define the scope of the metric. For
+  // well known URNs, the set of required labels is provided by the associated
+  // MonitoringInfoSpec.
+  //
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
+// A set of well known URNs that specify the encoding and aggregation method.
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
-
-// iterable is encoded with a beam:coder:double:v1 coder for each
-// element.
-LATEST_DOUBLES_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:latest_doubles"];
-  }
-}
-
-message Metric {
-  // (Required) The data for this metric.
-  oneof data {
-CounterData counter_data = 1;
-DistributionData distribution_data = 2;
-ExtremaData extrema_data = 3;
-  }
-}
-
-// Data associated with a Counter or Gauge metric.
-// This is designed to be compatible with metric collection
-// systems such as DropWizard.
-message CounterData {
-  oneof value {
-int64 int64_value = 1;
-double double_value = 2;
-string string_value = 3;
-  }
-}
-
-// Extrema messages are used for calculating
-// Top-N/Bottom-N metrics.
-message ExtremaData {
-  oneof extrema {
-IntExtremaData int_extrema_data = 1;
-DoubleExtremaData double_extrema_data = 2;
-  }
-}
-
-message IntExtremaData {
-  repeated int64 int_values = 1;
-}
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_double:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_double:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1  

[GitHub] [beam] ibzib opened a new pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server.

2020-03-20 Thread GitBox
ibzib opened a new pull request #11189: [BEAM-9446] Retain unknown arguments 
when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189
 
 
   See #11052 for context.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] youngoli commented on issue #11188: [BEAM-3301] Adding restriction trackers and validation.

2020-03-20 Thread GitBox
youngoli commented on issue #11188: [BEAM-3301] Adding restriction trackers and 
validation.
URL: https://github.com/apache/beam/pull/11188#issuecomment-601956825
 
 
   Run Go PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] youngoli opened a new pull request #11188: [BEAM-3301] Adding restriction trackers and validation.

2020-03-20 Thread GitBox
youngoli opened a new pull request #11188: [BEAM-3301] Adding restriction 
trackers and validation.
URL: https://github.com/apache/beam/pull/11188
 
 
   Adding RTrackers as an interface, and adding them to the SDF validation.
   
   I think this is the last real code involved in SDF validation, assuming I'm 
not forgetting anything. I might do a second pass on the error messages because 
they seem inconsistent with the old error messages, but the next major task is 
going to be working on the SDF exec code and doing some testing with the Flink 
runner.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[GitHub] [beam] udim merged pull request #10717: [BEAM-8280] Enable type hint annotations

2020-03-20 Thread GitBox
udim merged pull request #10717: [BEAM-8280] Enable type hint annotations
URL: https://github.com/apache/beam/pull/10717
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (56c15b9 -> 0351b49)

2020-03-20 Thread udim
This is an automated email from the ASF dual-hosted git repository.

udim pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 56c15b9  Merge pull request #11153 from [BEAM-9537] Adding a new 
module for FnApiRunner
 add 636de82  [BEAM-8280] Enable type hint annotations
 add 0351b49  Merge pull request #1071: [BEAM-8280] Enable type hint 
annotations

No new revisions were added by this update.

Summary of changes:
 CHANGES.md| 13 +
 sdks/python/apache_beam/typehints/decorators.py   | 16 +---
 sdks/python/apache_beam/typehints/decorators_test.py  | 12 ++--
 .../apache_beam/typehints/decorators_test_py3.py  | 19 ++-
 .../apache_beam/typehints/typed_pipeline_test.py  |  3 ---
 .../apache_beam/typehints/typed_pipeline_test_py3.py  |  3 ---
 .../apache_beam/typehints/typehints_test_py3.py   |  3 ---
 7 files changed, 54 insertions(+), 15 deletions(-)



[GitHub] [beam] davidyan74 commented on a change in pull request #11174: [BEAM-7923] Pop failed transform when error is raised

2020-03-20 Thread GitBox
davidyan74 commented on a change in pull request #11174: [BEAM-7923] Pop failed 
transform when error is raised
URL: https://github.com/apache/beam/pull/11174#discussion_r395933604
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -307,58 +307,61 @@ def _replace_if_needed(self, original_transform_node):
   elif len(inputs) == 0:
 input_node = pvalue.PBegin(self.pipeline)
 
-  # We have to add the new AppliedTransform to the stack before 
expand()
-  # and pop it out later to make sure that parts get added correctly.
-  self.pipeline.transforms_stack.append(replacement_transform_node)
-
-  # Keeping the same label for the replaced node but recursively
-  # removing labels of child transforms of original transform since 
they
-  # will be replaced during the expand below. This is needed in case
-  # the replacement contains children that have labels that conflicts
-  # with labels of the children of the original.
-  self.pipeline._remove_labels_recursively(original_transform_node)
-
-  new_output = replacement_transform.expand(input_node)
-  assert isinstance(
-  new_output, (dict, pvalue.PValue, pvalue.DoOutputsTuple))
-
-  if isinstance(new_output, pvalue.PValue):
-new_output.element_type = None
-self.pipeline._infer_result_type(
-replacement_transform, inputs, new_output)
-
-  if isinstance(new_output, dict):
-for new_tag, new_pcoll in new_output.items():
-  replacement_transform_node.add_output(new_pcoll, new_tag)
-  elif isinstance(new_output, pvalue.DoOutputsTuple):
-replacement_transform_node.add_output(
-new_output, new_output._main_tag)
-  else:
-replacement_transform_node.add_output(new_output, new_output.tag)
-
-  # Recording updated outputs. This cannot be done in the same visitor
-  # since if we dynamically update output type here, we'll run into
-  # errors when visiting child nodes.
-  #
-  # NOTE: When replacing multiple outputs, the replacement PCollection
-  # tags must have a matching tag in the original transform.
-  if isinstance(new_output, pvalue.PValue):
-if not new_output.producer:
-  new_output.producer = replacement_transform_node
-output_map[original_transform_node.outputs[new_output.tag]] = \
-new_output
-  elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)):
-for pcoll in new_output:
-  if not pcoll.producer:
-pcoll.producer = replacement_transform_node
-  output_map[original_transform_node.outputs[pcoll.tag]] = pcoll
-  elif isinstance(new_output, dict):
-for tag, pcoll in new_output.items():
-  if not pcoll.producer:
-pcoll.producer = replacement_transform_node
-  output_map[original_transform_node.outputs[tag]] = pcoll
-
-  self.pipeline.transforms_stack.pop()
+  try:
+# We have to add the new AppliedTransform to the stack before
+# expand() and pop it out later to make sure that parts get added
+# correctly.
+self.pipeline.transforms_stack.append(replacement_transform_node)
 
 Review comment:
   This is a python newbie question. Does list.append() ever throw an 
exception? If so, should we move this out of the try block so that we don't 
pop() if list.append() fails?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395933271
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of 

[GitHub] [beam] pabloem merged pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem merged pull request #11153: [BEAM-9537] Adding a new module for 
FnApiRunner
URL: https://github.com/apache/beam/pull/11153
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (0389f54 -> 56c15b9)

2020-03-20 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 0389f54  Merge pull request #11173 [BEAM-9558] Add explicit end bit 
for data channel.
 add 56c15b9  Merge pull request #11153 from [BEAM-9537] Adding a new 
module for FnApiRunner

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/pipeline.py|  6 +-
 .../runners/dataflow/dataflow_runner.py|  9 +--
 .../portability/fn_api_runner}/__init__.py |  1 +
 .../fn_runner.py}  | 69 +++---
 .../fn_runner_test.py} | 13 ++--
 .../translations.py}   |  0
 .../runners/portability/local_job_service.py   | 10 ++--
 .../runners/portability/portable_runner.py | 30 +-
 .../runners/portability/portable_runner_test.py|  4 +-
 9 files changed, 74 insertions(+), 68 deletions(-)
 copy sdks/python/apache_beam/{examples => 
runners/portability/fn_api_runner}/__init__.py (91%)
 rename sdks/python/apache_beam/runners/portability/{fn_api_runner.py => 
fn_api_runner/fn_runner.py} (97%)
 rename sdks/python/apache_beam/runners/portability/{fn_api_runner_test.py => 
fn_api_runner/fn_runner_test.py} (99%)
 rename 
sdks/python/apache_beam/runners/portability/{fn_api_runner_transforms.py => 
fn_api_runner/translations.py} (100%)



[GitHub] [beam] robertwb commented on issue #11110: [BEAM-9398] runtime_type_check: support setup

2020-03-20 Thread GitBox
robertwb commented on issue #0: [BEAM-9398] runtime_type_check: support 
setup
URL: https://github.com/apache/beam/pull/0#issuecomment-601952041
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (2f25e62 -> 0389f54)

2020-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 2f25e62  Merge pull request #11179: [BEAM-3301] Bugfix in DoFn 
validation.
 add eb05a91  [BEAM-9558] Add an explicit end field to the data channel 
protos.
 add 02cccac  [BEAM-9558] Regenerate go protos.
 add c15a7e2  [BEAM-9558] Produce and respect data channel end bit in 
runners and SDKs.
 new 0389f54  Merge pull request #11173 [BEAM-9558] Add explicit end bit 
for data channel.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../fn-execution/src/main/proto/beam_fn_api.proto  |  13 +-
 .../worker/fn/data/BeamFnDataGrpcServiceTest.java  |   5 +-
 .../fnexecution/data/GrpcDataServiceTest.java  |   5 +-
 sdks/go/pkg/beam/core/runtime/harness/datamgr.go   |   4 +-
 .../beam/model/fnexecution_v1/beam_fn_api.pb.go| 454 ++---
 .../sdk/fn/data/BeamFnDataInboundObserver.java |   3 +-
 ...amFnDataSizeBasedBufferingOutboundObserver.java |   3 +-
 ...DataSizeBasedBufferingOutboundObserverTest.java |  21 +-
 .../fn/harness/data/BeamFnDataGrpcClientTest.java  |   6 +-
 .../apache_beam/runners/worker/data_plane.py   |  14 +-
 10 files changed, 276 insertions(+), 252 deletions(-)



[beam] 01/01: Merge pull request #11173 [BEAM-9558] Add explicit end bit for data channel.

2020-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 0389f54ccd5c018f75c883bafa6be959503a9b58
Merge: 2f25e62 c15a7e2
Author: Robert Bradshaw 
AuthorDate: Fri Mar 20 16:23:37 2020 -0700

Merge pull request #11173 [BEAM-9558] Add explicit end bit for data channel.

 .../fn-execution/src/main/proto/beam_fn_api.proto  |  13 +-
 .../worker/fn/data/BeamFnDataGrpcServiceTest.java  |   5 +-
 .../fnexecution/data/GrpcDataServiceTest.java  |   5 +-
 sdks/go/pkg/beam/core/runtime/harness/datamgr.go   |   4 +-
 .../beam/model/fnexecution_v1/beam_fn_api.pb.go| 454 ++---
 .../sdk/fn/data/BeamFnDataInboundObserver.java |   3 +-
 ...amFnDataSizeBasedBufferingOutboundObserver.java |   3 +-
 ...DataSizeBasedBufferingOutboundObserverTest.java |  21 +-
 .../fn/harness/data/BeamFnDataGrpcClientTest.java  |   6 +-
 .../apache_beam/runners/worker/data_plane.py   |  14 +-
 10 files changed, 276 insertions(+), 252 deletions(-)



[GitHub] [beam] robertwb merged pull request #11173: [BEAM-9558] Add explicit end bit for data channel.

2020-03-20 Thread GitBox
robertwb merged pull request #11173: [BEAM-9558] Add explicit end bit for data 
channel.
URL: https://github.com/apache/beam/pull/11173
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] aaltay commented on issue #11187: optionally import grpc

2020-03-20 Thread GitBox
aaltay commented on issue #11187: optionally import grpc
URL: https://github.com/apache/beam/pull/11187#issuecomment-601949594
 
 
   @Hannah-Jiang feel free to self merge, once you address my comment and tests 
pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] aaltay commented on a change in pull request #11187: optionally import grpc

2020-03-20 Thread GitBox
aaltay commented on a change in pull request #11187: optionally import grpc
URL: https://github.com/apache/beam/pull/11187#discussion_r395928394
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py
 ##
 @@ -29,13 +29,20 @@
 
 import itertools
 
-import grpc
+try:
+  import grpc
+  from apache_beam.portability.api import beam_runner_api_pb2_grpc
+except:
+  grpc = None
+  beam_runner_api_pb2_grpc = None
+  # A workaround for some internal tests which are missing grpc dependencyy.
 
 Review comment:
   maybe change "for some internal tests" to "directrunner users who would 
cannot depend on grpc"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised

2020-03-20 Thread GitBox
aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error 
is raised
URL: https://github.com/apache/beam/pull/11174#issuecomment-601949155
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang opened a new pull request #11187: optionally import grpc

2020-03-20 Thread GitBox
Hannah-Jiang opened a new pull request #11187: optionally import grpc
URL: https://github.com/apache/beam/pull/11187
 
 
   R: @aaltay  
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.
URL: https://github.com/apache/beam/pull/11165#issuecomment-601947065
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.
URL: https://github.com/apache/beam/pull/11165#issuecomment-601946285
 
 
   This passes locally. 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
 seems unrelated. Retrying again. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.
URL: https://github.com/apache/beam/pull/11165#issuecomment-601946329
 
 
   Run Java PreCommit
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] iemejia opened a new pull request #11186: [BEAM-9564] Remove insecure ssl options from MongoDBIO

2020-03-20 Thread GitBox
iemejia opened a new pull request #11186: [BEAM-9564] Remove insecure ssl 
options from MongoDBIO
URL: https://github.com/apache/beam/pull/11186
 
 
   These changes are not backwards compatible but this is intended to solve the 
potential security issues and also because MongoDBIO does not have strong 
backwards compatibility yet (aka it is still tagged as `@Experimental`).
   
   R: @alexvanboxel
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem commented on issue #11153: [BEAM-9537] Adding a new module for 
FnApiRunner
URL: https://github.com/apache/beam/pull/11153#issuecomment-601938785
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] iemejia commented on issue #11008: Update comment to tell user this is not secure

2020-03-20 Thread GitBox
iemejia commented on issue #11008: Update comment to tell user this is not 
secure
URL: https://github.com/apache/beam/pull/11008#issuecomment-601937789
 
 
   I filled https://issues.apache.org/jira/browse/BEAM-9564 to remove this 
risky option since it seems we agree on this. I will open a PR for this soon.
   I am going to close this ticket. Thanks for bringing awareness on this issue 
@YYTVicky


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] iemejia closed pull request #11008: Update comment to tell user this is not secure

2020-03-20 Thread GitBox
iemejia closed pull request #11008: Update comment to tell user this is not 
secure
URL: https://github.com/apache/beam/pull/11008
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] youngoli merged pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

2020-03-20 Thread GitBox
youngoli merged pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (cd8a00c -> 2f25e62)

2020-03-20 Thread danoliveira
This is an automated email from the ASF dual-hosted git repository.

danoliveira pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from cd8a00c  Merge pull request #11162 from lukecwik/proto3
 add 78141f2  [BEAM-3301] Bugfix in DoFn validation.
 add 2f25e62  Merge pull request #11179: [BEAM-3301] Bugfix in DoFn 
validation.

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/core/graph/fn.go  | 33 -
 sdks/go/pkg/beam/core/graph/fn_test.go |  2 +-
 sdks/go/pkg/beam/pardo.go  |  3 ++-
 3 files changed, 15 insertions(+), 23 deletions(-)



[GitHub] [beam] aaltay commented on a change in pull request #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies

2020-03-20 Thread GitBox
aaltay commented on a change in pull request #11156: [BEAM-9444] Use GCP 
Libraries BOM for Google Cloud Dependencies
URL: https://github.com/apache/beam/pull/11156#discussion_r395911140
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -444,45 +439,46 @@ class BeamModulePlugin implements Plugin {
 commons_lang3   : 
"org.apache.commons:commons-lang3:3.9",
 commons_math3   : 
"org.apache.commons:commons-math3:3.6.1",
 error_prone_annotations : 
"com.google.errorprone:error_prone_annotations:2.0.15",
-gax : 
"com.google.api:gax:$gax_version",
-gax_grpc: 
"com.google.api:gax-grpc:$gax_version",
+gax : "com.google.api:gax",
+gax_grpc: 
"com.google.api:gax-grpc",
 google_api_client   : 
"com.google.api-client:google-api-client:$google_clients_version",
 google_api_client_jackson2  : 
"com.google.api-client:google-api-client-jackson2:$google_clients_version",
 google_api_client_java6 : 
"com.google.api-client:google-api-client-java6:$google_clients_version",
-google_api_common   : 
"com.google.api:api-common:1.8.1",
+google_api_common   : 
"com.google.api:api-common",
 
 Review comment:
   A question for my learning. Previously this file had specific dependency 
versions. For any released Beam version we could check the file in the release 
branch and get a list of dependencies and their versions.
   
   After this change, how can we do the same thing? Would it happen through a 
generated BOM file in the release branch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-03-20 Thread GitBox
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom 
BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r395910855
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
 cls.bigquery_client.insert_rows(
 cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+byts = b'\xab\xac'
 expected_row = {
 'float': 0.33,
 'numeric': Decimal('10'),
-'bytes': base64.b64encode(b'\xab\xac'),
+'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   The behavior will be different for different transforms. Users will need to 
explicitly change the transform in their code. We can make them aware of the 
new transform, and its typing differences in release notes, and possibly in 
Pydoc for the new transform as well. Thoughts? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395908901
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of 

[GitHub] [beam] robertwb commented on issue #11173: [BEAM-9558] Add explicit end bit for data channel.

2020-03-20 Thread GitBox
robertwb commented on issue #11173: [BEAM-9558] Add explicit end bit for data 
channel.
URL: https://github.com/apache/beam/pull/11173#issuecomment-601930536
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised

2020-03-20 Thread GitBox
robertwb commented on issue #11174: [BEAM-7923] Pop failed transform when error 
is raised
URL: https://github.com/apache/beam/pull/11174#issuecomment-601929637
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395908901
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of 

[GitHub] [beam] iemejia commented on issue #11099: [BEAM-9420] Configurable timeout for blocking kafka API call(s)

2020-03-20 Thread GitBox
iemejia commented on issue #11099: [BEAM-9420] Configurable timeout for 
blocking kafka API call(s)
URL: https://github.com/apache/beam/pull/11099#issuecomment-601928283
 
 
   @aromanenko-dev maybe since he has been maintaing KafkaIo for a while.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj opened a new pull request #11185: [BEAM-8019] Some generalizations to support cross-language transforms.

2020-03-20 Thread GitBox
chamikaramj opened a new pull request #11185: [BEAM-8019] Some generalizations 
to support cross-language transforms.
URL: https://github.com/apache/beam/pull/11185
 
 
   These are needed for for runners that need to build a Python object graph 
from a runner API proto with external transforms (for example, Dataflow).
   
   Some generalizations to support cross-language transforms.
   
   Testing - cross-language test suite [1] works for Dataflow with these 
changes (will be enabled separately).
   
   WIP: not all tests pass yet
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L51
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[GitHub] [beam] chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to support cross-language transforms.

2020-03-20 Thread GitBox
chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to 
support cross-language transforms.
URL: https://github.com/apache/beam/pull/11185#issuecomment-601927190
 
 
   cc: @robertwb @lukecwik 
   
   Still addressing some unit test failures but sharing for any early comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395905193
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem commented on issue #11153: [BEAM-9537] Adding a new module for 
FnApiRunner
URL: https://github.com/apache/beam/pull/11153#issuecomment-601923625
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #10717: [BEAM-8280] Enable type hint annotations

2020-03-20 Thread GitBox
udim commented on issue #10717: [BEAM-8280] Enable type hint annotations
URL: https://github.com/apache/beam/pull/10717#issuecomment-601923382
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.
URL: https://github.com/apache/beam/pull/11165#issuecomment-601921071
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #10760: [BEAM-9545] Dataframe transforms

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #10760: [BEAM-9545] Dataframe 
transforms
URL: https://github.com/apache/beam/pull/10760#discussion_r395900936
 
 

 ##
 File path: sdks/python/apache_beam/dataframe/transforms.py
 ##
 @@ -0,0 +1,246 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+import pandas as pd
+
+import apache_beam as beam
+from apache_beam import transforms
+from apache_beam.dataframe import expressions
+from apache_beam.dataframe import frame_base
+from apache_beam.dataframe import frames  # pylint: disable=unused-import
+
+
+class DataframeTransform(transforms.PTransform):
+  """A PTransform for applying function that takes and returns dataframes
+  to one or more PCollections.
+
+  For example, if pcoll is a PCollection of dataframes, one could write::
+
+  pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), 
proxy=...)
+
+  To pass multiple PCollections, pass a tuple of PCollections wich will be
+  passed to the callable as positional arguments, or a dictionary of
+  PCollections, in which case they will be passed as keyword arguments.
+  """
+  def __init__(self, func, proxy):
+self._func = func
+self._proxy = proxy
+
+  def expand(self, input_pcolls):
+def wrap_as_dict(values):
+  if isinstance(values, dict):
+return values
+  elif isinstance(values, tuple):
+return dict(enumerate(values))
+  else:
+return {None: values}
+
+# TODO: Infer the proxy from the input schema.
+def proxy(key):
+  if key is None:
+return self._proxy
+  else:
+return self._proxy[key]
+
+# The input can be a dictionary, tuple, or plain PCollection.
+# Wrap as a dict for homogeneity.
+# TODO: Possibly inject batching here.
+input_dict = wrap_as_dict(input_pcolls)
+placeholders = {
+key: frame_base.DeferredFrame.wrap(
+expressions.PlaceholderExpression(proxy(key)))
+for key in input_dict.keys()
+}
+
+# The calling convention of the user-supplied func varies according to the
+# type of the input.
+if isinstance(input_pcolls, dict):
+  result_frames = self._func(**placeholders)
+elif isinstance(input_pcolls, tuple):
+  result_frames = self._func(
+  *(value for _, value in sorted(placeholders.items(
+else:
+  result_frames = self._func(placeholders[None])
+
+# Likewise the output may be a dict, tuple, or raw (deferred) Dataframe.
+result_dict = wrap_as_dict(result_frames)
+
+result_pcolls = self._apply_deferred_ops(
+{placeholders[key]._expr: pcoll
+ for key, pcoll in input_dict.items()},
+{key: df._expr
+ for key, df in result_dict.items()})
+
+# Convert the result back into a set of PCollections.
+if isinstance(result_frames, dict):
+  return result_pcolls
+elif isinstance(result_frames, tuple):
+  return tuple((value for _, value in sorted(result_pcolls.items(
+else:
+  return result_pcolls[None]
+
+  def _apply_deferred_ops(
+  self,
+  inputs,  # type: Dict[PlaceholderExpr, PCollection]
+  outputs,  # type: Dict[Any, Expression]
+  ):  # -> Dict[Any, PCollection]
+"""Construct a Beam graph that evaluates a set of expressions on a set of
+input PCollections.
+
+:param inputs: A mapping of placeholder expressions to PCollections.
+:param outputs: A mapping of keys to expressions defined in terms of the
+placeholders of inputs.
+
+Returns a dictionary whose keys are those of outputs, and whose values are
+PCollections corresponding to the values of outputs evaluated at the
+values of inputs.
+
+Logically, `_apply_deferred_ops({x: a, y: b}, {f: F(x, y), g: G(x, y)})`
+returns `{f: F(a, b), g: G(a, b)}`.
+"""
+class ComputeStage(beam.PTransform):
+  """A helper transform that computes a single stage of operations.
+  """
+  def __init__(self, stage):
+self.stage = stage
+
+  def default_label(self):
+return '%s:%s' % (self.stage.ops, id(self))
+
+  def expand(self, pcolls):
+if self.stage.is_grouping:
+  # Arrange such that 

[GitHub] [beam] lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395900617
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -194,33 +188,25 @@ extend google.protobuf.EnumValueOptions {
 }
 
 message MonitoringInfo {
-  // The name defining the metric or monitored state.
+  // The name defining the semantic meaning of the metric or monitored state.
+  //
+  // See MonitoringInfoSpecs.Enum for the set of well known metrics/monitored
+  // state.
   string urn = 1;
 
-  // This is specified as a URN that implies:
-  // A message class: (Distribution, Counter, Extrema, MonitoringDataTable).
-  // Sub types like field formats - int64, double, string.
-  // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION
-  // valid values are:
-  // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64|
-  // sum_double|latest_double|top_n_double|bottom_n_double|
-  // distribution_int_64|distribution_double|monitoring_data_table|
-  // latest_doubles
+  // This is specified as a URN that implies the encoding and aggregation
+  // method. See MonitoringInfoTypeUrns.Enum for the set of well known types.
   string type = 2;
 
-  // The Metric or monitored state.
-  oneof data {
-MonitoringTableData monitoring_table_data = 3;
-Metric metric = 4;
-bytes payload = 7;
-  }
+  // The monitored state encoded as per the specification defined by the type.
+  bytes payload = 3;
 
 Review comment:
   Yup


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395900505
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of 

[GitHub] [beam] iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files

2020-03-20 Thread GitBox
iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for 
Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-601918008
 
 
   We let it like this because we had a regression, if we want to generate and 
compile the class we need to have thrift installed in all Beam workers (and as 
a requirement for everyone building Beam) which I think is clearly overklll 
just for a test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395897316
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -269,8 +269,8 @@ def visit_transform(self, transform_node):
   pcoll.element_type, transform_node.full_label)
   key_type, value_type = pcoll.element_type.tuple_types
   if transform_node.outputs:
-from apache_beam.runners.portability.fn_api_runner_transforms 
import \
-  only_element
+from apache_beam.runners.portability.fn_api_runner.transforms \
 
 Review comment:
   I meant that utilities like only_element should probably be in a more common 
place rather than imported elsewhere from here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395893496
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of integer 

[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395892463
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
 
 Review comment:
   Let's add some comments to make it clear the type is referring to what is 
collected in each MonitoringInfo update, and how they should be aggregated 
together


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] aaltay commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test

2020-03-20 Thread GitBox
aaltay commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py 
is missing a test
URL: https://github.com/apache/beam/pull/10914#issuecomment-601913507
 
 
   @Tesio - unfortunately, due to an ongoing issue, jenkins only starts test 
when a committer requests it. This should be resolved but in the meantime, if 
you need to trigger tests feel free to ask on the dev@ list for someone to do 
it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] aaltay commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test

2020-03-20 Thread GitBox
aaltay commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py 
is missing a test
URL: https://github.com/apache/beam/pull/10914#issuecomment-601913195
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395888595
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
 
 Review comment:
   I think there are only a few of these types being used now. SUM_INT64_TYPE 
and DISTRIBUTION_INT64_TYPE. I hope we can make it very simple to add new ones 
of these with minimal changes (Adding MonitoringInfoSpec and reusing existing 
framework/libraries in the SDK, runners can mostly pass them through to a 
service to aggregate across multiple workers)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395889104
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -194,33 +188,25 @@ extend google.protobuf.EnumValueOptions {
 }
 
 message MonitoringInfo {
-  // The name defining the metric or monitored state.
+  // The name defining the semantic meaning of the metric or monitored state.
+  //
+  // See MonitoringInfoSpecs.Enum for the set of well known metrics/monitored
+  // state.
   string urn = 1;
 
-  // This is specified as a URN that implies:
-  // A message class: (Distribution, Counter, Extrema, MonitoringDataTable).
-  // Sub types like field formats - int64, double, string.
-  // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION
-  // valid values are:
-  // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64|
-  // sum_double|latest_double|top_n_double|bottom_n_double|
-  // distribution_int_64|distribution_double|monitoring_data_table|
-  // latest_doubles
+  // This is specified as a URN that implies the encoding and aggregation
+  // method. See MonitoringInfoTypeUrns.Enum for the set of well known types.
   string type = 2;
 
-  // The Metric or monitored state.
-  oneof data {
-MonitoringTableData monitoring_table_data = 3;
-Metric metric = 4;
-bytes payload = 7;
-  }
+  // The monitored state encoded as per the specification defined by the type.
+  bytes payload = 3;
 
 Review comment:
   My biggest concern is losing the ability to print debug strings, which are 
helpful when people are trying to learn how these are populated. But maybe we 
can just add a few obvious places to dump debug logs, debug files, etc with the 
MonitoringInfors parses properly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395888786
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -194,33 +188,25 @@ extend google.protobuf.EnumValueOptions {
 }
 
 message MonitoringInfo {
-  // The name defining the metric or monitored state.
+  // The name defining the semantic meaning of the metric or monitored state.
+  //
+  // See MonitoringInfoSpecs.Enum for the set of well known metrics/monitored
+  // state.
   string urn = 1;
 
-  // This is specified as a URN that implies:
-  // A message class: (Distribution, Counter, Extrema, MonitoringDataTable).
-  // Sub types like field formats - int64, double, string.
-  // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION
-  // valid values are:
-  // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64|
-  // sum_double|latest_double|top_n_double|bottom_n_double|
-  // distribution_int_64|distribution_double|monitoring_data_table|
-  // latest_doubles
+  // This is specified as a URN that implies the encoding and aggregation
+  // method. See MonitoringInfoTypeUrns.Enum for the set of well known types.
   string type = 2;
 
-  // The Metric or monitored state.
-  oneof data {
-MonitoringTableData monitoring_table_data = 3;
-Metric metric = 4;
-bytes payload = 7;
-  }
+  // The monitored state encoded as per the specification defined by the type.
+  bytes payload = 3;
 
 Review comment:
   Probably you are already planning on doing this. But having helper functions 
to easily encode/decode these would be great.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395888153
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -229,101 +215,127 @@ message MonitoringInfo {
 NAMESPACE = 5 [(label_props) = { name: "NAMESPACE" }];
 NAME = 6 [(label_props) = { name: "NAME" }];
   }
+
   // A set of key+value labels which define the scope of the metric.
   // Either a well defined entity id for matching the enum names in
   // the MonitoringInfoLabels enum or any arbitrary label
   // set by a custom metric or user metric.
+  //
   // A monitoring system is expected to be able to aggregate the metrics
   // together for all updates having the same URN and labels. Some systems such
   // as Stackdriver will be able to aggregate the metrics using a subset of the
   // provided labels
-  map labels = 5;
-
-  // The walltime of the most recent update.
-  // Useful for aggregation for latest types such as LatestInt64.
-  google.protobuf.Timestamp timestamp = 6;
+  map labels = 4;
 }
 
 message MonitoringInfoTypeUrns {
   enum Enum {
+// Represents an integer counter where values are summed across bundles.
+//
+// Encoding: 
+//   - value: beam:coder:varint:v1
 SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-"beam:metrics:sum_int_64"];
-
-DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metrics:distribution_int_64"];
-
-LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
-   "beam:metrics:latest_int_64"];
+"beam:metrics:sum_int64:v1"];
+
+// Represents a double counter where values are summed across bundles.
+//
+// Encoding: 
+//   value: beam:coder:double:v1
+SUM_DOUBLE_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int64:v1"];
+
+// Represents a distribution of an integer value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:varint:v1
+//   - min:   beam:coder:varint:v1
+//   - max:   beam:coder:varint:v1
+DISTRIBUTION_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents a distribution of a double value where:
+//   - count: represents the number of values seen across all bundles
+//   - sum: represents the total of the value across all bundles
+//   - min: represents the smallest value seen across all bundles
+//   - max: represents the largest value seen across all bundles
+//
+// Encoding: 
+//   - count: beam:coder:varint:v1
+//   - sum:   beam:coder:double:v1
+//   - min:   beam:coder:double:v1
+//   - max:   beam:coder:double:v1
+DISTRIBUTION_DOUBLE_TYPE = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) 
=
+ "beam:metrics:distribution_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:varint:v1
+LATEST_INT64_TYPE = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the latest seen integer value. The timestamp is used to
+// provide an "ordering" over multiple values to determine which is the
+// latest.
+//
+// Encoding: 
+//   - timestamp: beam:coder:varint:v1 (milliseconds since epoch)
+//   - value: beam:coder:double:v1
+LATEST_DOUBLE_TYPE = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:latest_int64:v1"];
+
+// Represents the largest set of integer values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:varint:v1
+TOP_N_INT64_TYPE = 6 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+   "beam:metrics:top_n_int64:v1"];
+
+// Represents the largest set of double values seen across bundles.
+//
+// Encoding: ...
+//   - valueX: beam:coder:double:v1
+TOP_N_DOUBLE_TYPE = 7 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:top_n_int64:v1"];
+
+// Represents the smallest set of integer 

[GitHub] [beam] ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
ajamato commented on a change in pull request #11184: [WIP][BEAM-4374] Update 
protos related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#discussion_r395887788
 
 

 ##
 File path: model/pipeline/src/main/proto/metrics.proto
 ##
 @@ -139,7 +137,7 @@ message MonitoringInfoSpecs {
 
 USER_DISTRIBUTION_COUNTER = 6 [(monitoring_info_spec) = {
   urn: "beam:metric:user_distribution",
-  type_urn: "beam:metrics:distribution_int_64",
+  type_urn: "beam:metrics:distribution_int64",
 
 Review comment:
   Add a :v1 here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] KevinGG commented on a change in pull request #11174: [BEAM-7923] Pop failed transform in CombineGlobally

2020-03-20 Thread GitBox
KevinGG commented on a change in pull request #11174: [BEAM-7923] Pop failed 
transform in CombineGlobally
URL: https://github.com/apache/beam/pull/11174#discussion_r395890026
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1811,6 +1811,8 @@ def add_input_types(transform):
   return view
 else:
   if pcoll.windowing.windowfn != GlobalWindows():
+# Remove the broken transform when running into value error.
+pcoll.pipeline.transforms_stack.pop()
 
 Review comment:
   Yes, agree with it! I'll make the change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11173: [BEAM-9558] Add explicit end bit for data channel.

2020-03-20 Thread GitBox
robertwb commented on issue #11173: [BEAM-9558] Add explicit end bit for data 
channel.
URL: https://github.com/apache/beam/pull/11173#issuecomment-601910742
 
 
   Rebased on #11177 and updated the proto for timers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Tesio commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test

2020-03-20 Thread GitBox
Tesio commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py 
is missing a test
URL: https://github.com/apache/beam/pull/10914#issuecomment-601910749
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395888936
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -78,12 +78,12 @@
 from apache_beam.runners import pipeline_context
 from apache_beam.runners import runner
 from apache_beam.runners.portability import artifact_service
-from apache_beam.runners.portability import fn_api_runner_transforms
 from apache_beam.runners.portability import portable_metrics
-from apache_beam.runners.portability.fn_api_runner_transforms import 
create_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
only_element
-from apache_beam.runners.portability.fn_api_runner_transforms import 
split_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
unique_name
+from apache_beam.runners.portability.fn_api_runner import transforms
 
 Review comment:
   Done. Went with translations. But I also like optimizations... Maybe I'll 
rename if it becomes a little bit more optimizations than translations down the 
road.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395889275
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -269,8 +269,8 @@ def visit_transform(self, transform_node):
   pcoll.element_type, transform_node.full_label)
   key_type, value_type = pcoll.element_type.tuple_types
   if transform_node.outputs:
-from apache_beam.runners.portability.fn_api_runner_transforms 
import \
-  only_element
+from apache_beam.runners.portability.fn_api_runner.transforms \
 
 Review comment:
   You mean the newly named `translations` module may need to exist in 
`apache_beam/utils`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
pabloem commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395888936
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -78,12 +78,12 @@
 from apache_beam.runners import pipeline_context
 from apache_beam.runners import runner
 from apache_beam.runners.portability import artifact_service
-from apache_beam.runners.portability import fn_api_runner_transforms
 from apache_beam.runners.portability import portable_metrics
-from apache_beam.runners.portability.fn_api_runner_transforms import 
create_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
only_element
-from apache_beam.runners.portability.fn_api_runner_transforms import 
split_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
unique_name
+from apache_beam.runners.portability.fn_api_runner import transforms
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik edited a comment on issue #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik edited a comment on issue #11184: [WIP][BEAM-4374] Update protos 
related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#issuecomment-601897399
 
 
   CC: @ajamato @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395885958
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -78,12 +78,12 @@
 from apache_beam.runners import pipeline_context
 from apache_beam.runners import runner
 from apache_beam.runners.portability import artifact_service
-from apache_beam.runners.portability import fn_api_runner_transforms
 from apache_beam.runners.portability import portable_metrics
-from apache_beam.runners.portability.fn_api_runner_transforms import 
create_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
only_element
-from apache_beam.runners.portability.fn_api_runner_transforms import 
split_buffer_id
-from apache_beam.runners.portability.fn_api_runner_transforms import 
unique_name
+from apache_beam.runners.portability.fn_api_runner import transforms
 
 Review comment:
   Seeing this unqualified makes me realize that it's ambiguous with the notion 
of a PTransform (and the apache_beam.transforms package). Maybe we should call 
it `optimizations` or `translations`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new module for FnApiRunner

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11153: [BEAM-9537] Adding a new 
module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#discussion_r395884481
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -269,8 +269,8 @@ def visit_transform(self, transform_node):
   pcoll.element_type, transform_node.full_label)
   key_type, value_type = pcoll.element_type.tuple_types
   if transform_node.outputs:
-from apache_beam.runners.portability.fn_api_runner_transforms 
import \
-  only_element
+from apache_beam.runners.portability.fn_api_runner.transforms \
 
 Review comment:
   You don't have to do this in this PR, but this should probably be in 
apache_beam/utils. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11174: [BEAM-7923] Pop failed transform in CombineGlobally

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11174: [BEAM-7923] Pop failed 
transform in CombineGlobally
URL: https://github.com/apache/beam/pull/11174#discussion_r395883196
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1811,6 +1811,8 @@ def add_input_types(transform):
   return view
 else:
   if pcoll.windowing.windowfn != GlobalWindows():
+# Remove the broken transform when running into value error.
+pcoll.pipeline.transforms_stack.pop()
 
 Review comment:
   E.g. put this 
https://github.com/apache/beam/blob/release-2.19.0/sdks/python/apache_beam/pipeline.py#L330
 in a finally clause of a try block that starts where it's pushed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11174: [BEAM-7923] Pop failed transform in CombineGlobally

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11174: [BEAM-7923] Pop failed 
transform in CombineGlobally
URL: https://github.com/apache/beam/pull/11174#discussion_r395882293
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1811,6 +1811,8 @@ def add_input_types(transform):
   return view
 else:
   if pcoll.windowing.windowfn != GlobalWindows():
+# Remove the broken transform when running into value error.
+pcoll.pipeline.transforms_stack.pop()
 
 Review comment:
   This is not the right place to pop this (internal) stack. Instead, we should 
popping from the stack in a finally clause of a try block that pushes to the 
stack. (Alternatively, we could manage the stack with a Python context, but 
that might be overkill.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601899121
 
 
   Run Java PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601899388
 
 
   Run SQL Postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601899334
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601899254
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601899190
 
 
   Run Java HadoopFormatIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on issue #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik commented on issue #11184: [WIP][BEAM-4374] Update protos related to 
MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184#issuecomment-601897399
 
 
   CC: @ajamato 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik opened a new pull request #11184: [WIP][BEAM-4374] Update protos related to MonitoringInfo.

2020-03-20 Thread GitBox
lukecwik opened a new pull request #11184: [WIP][BEAM-4374] Update protos 
related to MonitoringInfo.
URL: https://github.com/apache/beam/pull/11184
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] apilloud commented on issue #11041: Use beam join api in sql

2020-03-20 Thread GitBox
apilloud commented on issue #11041: Use beam join api in sql
URL: https://github.com/apache/beam/pull/11041#issuecomment-601896403
 
 
   Run SQL Postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] suztomo commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
suztomo commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601894622
 
 
   Yes, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of "force" in Java build

2020-03-20 Thread GitBox
je-ik commented on issue #11168: [BEAM-9542] Limit and clarify the effect of 
"force" in Java build
URL: https://github.com/apache/beam/pull/11168#issuecomment-601894197
 
 
   @suztomo do you need to rerun the checks that were not triggered? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] apilloud commented on issue #11041: Use beam join api in sql

2020-03-20 Thread GitBox
apilloud commented on issue #11041: Use beam join api in sql
URL: https://github.com/apache/beam/pull/11041#issuecomment-601893513
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang commented on issue #11067: [BEAM-9136]Add licenses for dependencies

2020-03-20 Thread GitBox
Hannah-Jiang commented on issue #11067: [BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#issuecomment-601889129
 
 
   @tvalentyn , I fixed all your comments about Python. PTAL when you have time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang commented on a change in pull request #11067: [BEAM-9136]Add licenses for dependencies

2020-03-20 Thread GitBox
Hannah-Jiang commented on a change in pull request #11067: [BEAM-9136]Add 
licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r395865763
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
+  license_dir = os.getcwd() + '/licenses/python'
+  no_licenses = []
+
+  with open('licenses/scripts/dep_urls_py.yaml') as file:
+dep_config = yaml.load(file)
+
+  install_pip_licenses()
 
 Review comment:
   The script is running at a virtual environment with [apache-beam, test, gcp, 
docs, aws, interactive] (this list is referred from setup.py) installed. With 
this virtual environment, pyyaml, tenacity etc are installed, we only need to 
install wget and pip-licenses. I moved this list to `license_requirements.txt` 
and call it within the script. 
   
   The reason I call the `pip install -r` within the script is simply the UI 
for users. Users need to run this script to add licenses when new 

[GitHub] [beam] youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

2020-03-20 Thread GitBox
youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in 
DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395850295
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##
 @@ -446,23 +444,16 @@ func validateMainInputs(fn *Fn, method *funcx.Fn, 
methodName string, numMainIn m
return err
}
 
-   // Check that the first numMainIn inputs are not side inputs (Iters or
-   // ReIters). We aren't able to catch singleton side inputs here since
-   // they're indistinguishable from main inputs.
-   mainInputs := method.Param[pos : pos+int(numMainIn)]
-   for i, p := range mainInputs {
-   if p.Kind != funcx.FnValue {
-   err := errors.Errorf("expected main input parameter but 
found "+
-   "side input parameter in position %v",
-   pos+i)
-   err = errors.SetTopLevelMsgf(err,
-   "Method %v in DoFn %v should have all main 
inputs before side inputs, "+
-   "but a side input (as Iter or ReIter) 
appears as parameter %v when a "+
-   "main input was expected.",
-   methodName, fn.Name(), pos+i)
-   err = errors.WithContextf(err, "method %v", methodName)
-   return err
-   }
+   // Check that the first input is not an Iter or ReIter (those aren't 
valid
+   // as the first main input).
+   first := method.Param[pos].Kind
+   if first != funcx.FnValue {
+   err := errors.New("first main input parameter must be value 
type")
 
 Review comment:
   I'll just add it in real quick while squashing the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on a change in pull request #11165: [BEAM-9340] Populate 
requirements for Java.
URL: https://github.com/apache/beam/pull/11165#discussion_r395845584
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 ##
 @@ -124,6 +124,15 @@
   public static final String 
SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN =
   "beam:transform:sdf_process_sized_element_and_restrictions:v1";
 
+  public static final String REQUIRES_STATEFUL_PROCESSING_URN =
+  getUrn(RunnerApi.StandardRequirements.Enum.REQUIRES_STATEFUL_PROCESSING);
 
 Review comment:
   That's unfortunate, but I see the pattern. Have to be vigilant to prevent 
bugs. (Unlikely that these'll be used in switch statements, but consistency is 
good.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.

2020-03-20 Thread GitBox
robertwb commented on issue #11165: [BEAM-9340] Populate requirements for Java.
URL: https://github.com/apache/beam/pull/11165#issuecomment-601872470
 
 
   I rebased on top of your PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

2020-03-20 Thread GitBox
lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in 
DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395840270
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##
 @@ -446,23 +444,16 @@ func validateMainInputs(fn *Fn, method *funcx.Fn, 
methodName string, numMainIn m
return err
}
 
-   // Check that the first numMainIn inputs are not side inputs (Iters or
-   // ReIters). We aren't able to catch singleton side inputs here since
-   // they're indistinguishable from main inputs.
-   mainInputs := method.Param[pos : pos+int(numMainIn)]
-   for i, p := range mainInputs {
-   if p.Kind != funcx.FnValue {
-   err := errors.Errorf("expected main input parameter but 
found "+
-   "side input parameter in position %v",
-   pos+i)
-   err = errors.SetTopLevelMsgf(err,
-   "Method %v in DoFn %v should have all main 
inputs before side inputs, "+
-   "but a side input (as Iter or ReIter) 
appears as parameter %v when a "+
-   "main input was expected.",
-   methodName, fn.Name(), pos+i)
-   err = errors.WithContextf(err, "method %v", methodName)
-   return err
-   }
+   // Check that the first input is not an Iter or ReIter (those aren't 
valid
+   // as the first main input).
+   first := method.Param[pos].Kind
+   if first != funcx.FnValue {
+   err := errors.New("first main input parameter must be value 
type")
 
 Review comment:
   ...must be a value type..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

2020-03-20 Thread GitBox
lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in 
DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395838378
 
 

 ##
 File path: sdks/go/pkg/beam/pcollection.go
 ##
 @@ -60,6 +60,12 @@ func (p PCollection) Type() FullType {
return p.n.Type()
 }
 
+// OutputsKV returns whether the output of this PCollection are single value
+// elements or KV pairs.
+func (p PCollection) OutputsKV() bool {
 
 Review comment:
   That's my usual guideline. If I use it once, keep it in place; twice, copy 
it; three times, helper function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #11183: [BEAM-8889]add experiment flag use_grpc_for_gcs

2020-03-20 Thread GitBox
chamikaramj commented on issue #11183: [BEAM-8889]add experiment flag 
use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-601864385
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #11183: [BEAM-8889]add experiment flag use_grpc_for_gcs

2020-03-20 Thread GitBox
chamikaramj commented on issue #11183: [BEAM-8889]add experiment flag 
use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-601864474
 
 
   LGTM. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on issue #11180: [BEAM-9563] Change ToListCombineFn access level to private

2020-03-20 Thread GitBox
lukecwik commented on issue #11180: [BEAM-9563] Change ToListCombineFn access 
level to private
URL: https://github.com/apache/beam/pull/11180#issuecomment-601864250
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

2020-03-20 Thread GitBox
youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in 
DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395836071
 
 

 ##
 File path: sdks/go/pkg/beam/pcollection.go
 ##
 @@ -60,6 +60,12 @@ func (p PCollection) Type() FullType {
return p.n.Type()
 }
 
+// OutputsKV returns whether the output of this PCollection are single value
+// elements or KV pairs.
+func (p PCollection) OutputsKV() bool {
 
 Review comment:
   I was originally picturing this as a helper function for callers of NewDoFn. 
It seems easy for future callers to make a mistake and only check if the 
PCollection is a KV and forget to check for CoGBK, so I thought a helper method 
would be useful in the future.
   
   1. I missed that pardo.go is in the same package as pcollection.go. I'm also 
leaning to not expanding the user surface if it's not necessary.
   
   2 & 3. Yeah I was unsure about the name, since it's not technically checking 
for KVs, I just couldn't think of anything better. IsKeyed sounds good though.
   
   4. That's the other part I was debating. My goal was to make it easy to 
avoid the mistake in the future, but thinking about it... It seems unlikely 
that anyone else would even be using this code, and I would expect that if they 
were they were an advanced user doing something tricky.
   
   I think I'll go with just putting the conditional in pardo.go and adding a 
comment. We can always turn it into a helper later if it does get used in 
multiple places.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-03-20 Thread GitBox
chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts 
from environment
URL: https://github.com/apache/beam/pull/11039#issuecomment-601864064
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (c728d25 -> cd8a00c)

2020-03-20 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c728d25  Merge pull request #11177 from boyuanzz/timer_proto
 new 17b0216  [BEAM-9339, BEAM-2939] Drop splittable field from proto, add 
splittable dofn capability to Python SDK.
 new cd8a00c  Merge pull request #11162 from lukecwik/proto3

The 26111 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../pipeline/src/main/proto/beam_runner_api.proto  | 14 +++--
 .../core/construction/ParDoTranslation.java| 60 ++
 .../runners/core/construction/SplittableParDo.java |  5 --
 .../dataflow/PrimitiveParDoSingleFactory.java  |  5 --
 sdks/go/pkg/beam/core/runtime/graphx/translate.go  |  1 -
 .../runners/portability/fn_api_runner.py   |  4 ++
 .../portability/fn_api_runner_transforms.py|  2 +-
 .../apache_beam/runners/worker/bundle_processor.py |  4 +-
 sdks/python/apache_beam/transforms/core.py |  3 +-
 9 files changed, 69 insertions(+), 29 deletions(-)



  1   2   3   >