[jira] [Work logged] (BEAM-6097) Add NemoRunner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6097?focusedWorklogId=182208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182208
 ]

ASF GitHub Bot logged work on BEAM-6097:


Author: ASF GitHub Bot
Created on: 08/Jan/19 07:29
Start Date: 08/Jan/19 07:29
Worklog Time Spent: 10m 
  Work Description: wonook commented on issue #7236: [BEAM-6097] NemoRunner
URL: https://github.com/apache/beam/pull/7236#issuecomment-452200892
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182208)
Time Spent: 6h 40m  (was: 6.5h)

> Add NemoRunner
> --
>
> Key: BEAM-6097
> URL: https://issues.apache.org/jira/browse/BEAM-6097
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Won Wook SONG
>Assignee: Won Wook SONG
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Add NemoRunner (http://nemo.apache.org)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6097) Add NemoRunner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6097?focusedWorklogId=182207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182207
 ]

ASF GitHub Bot logged work on BEAM-6097:


Author: ASF GitHub Bot
Created on: 08/Jan/19 07:29
Start Date: 08/Jan/19 07:29
Worklog Time Spent: 10m 
  Work Description: wonook commented on issue #7236: [BEAM-6097] NemoRunner
URL: https://github.com/apache/beam/pull/7236#issuecomment-452200875
 
 
   @mxm Thanks! 😄 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182207)
Time Spent: 6.5h  (was: 6h 20m)

> Add NemoRunner
> --
>
> Key: BEAM-6097
> URL: https://issues.apache.org/jira/browse/BEAM-6097
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Won Wook SONG
>Assignee: Won Wook SONG
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Add NemoRunner (http://nemo.apache.org)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6383) Your project apache/beam is using buggy third-party libraries [WARNING]

2019-01-07 Thread Kaifeng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaifeng Huang closed BEAM-6383.
---
   Resolution: Incomplete
Fix Version/s: Not applicable

> Your project apache/beam is using buggy third-party libraries [WARNING]
> ---
>
> Key: BEAM-6383
> URL: https://issues.apache.org/jira/browse/BEAM-6383
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kaifeng Huang
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: Not applicable
>
>
> Hi, there!
> We are a research team working on third-party library analysis. We have found 
> that some widely-used third-party libraries in your project have 
> major/critical bugs, which will degrade the quality of your project. We 
> highly recommend you to update those libraries to new versions.
> We have attached the buggy third-party libraries and corresponding jira issue 
> links below for you to have more detailed information.
>   1  org.apache.httpcomponents httpclient 
> (sdks/java/io/elasticsearch/build.gradle,sdks/java/io/solr/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/amazon-web-services/build.gradle)
>   version: 4.5.6
>   Jira issues:
>   Support relatively new HTTP 308 redirect - RFC7538
>   affectsVersions:3.1 (end of life),4.5.6
>   
> https://issues.apache.org/jira/projects/HTTPCLIENT/issues/HTTPCLIENT-1946?filter=allopenissues
>   2  commons-cli commons-cli (release/build.gradle)
>   version: 1.2
>   Jira issues:
>   Unable to select a pure long option in a group
>   affectsVersions:1.0;1.1;1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-182?filter=allopenissues
>   Clear the selection from the groups before parsing
>   affectsVersions:1.0;1.1;1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-183?filter=allopenissues
>   Commons CLI incorrectly stripping leading and trailing quotes
>   affectsVersions:1.1;1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-185?filter=allopenissues
>   Coding error: OptionGroup.setSelected causes 
> java.lang.NullPointerException
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-191?filter=allopenissues
>   StringIndexOutOfBoundsException in HelpFormatter.findWrapPos
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-193?filter=allopenissues
>   HelpFormatter strips leading whitespaces in the footer
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-207?filter=allopenissues
>   OptionBuilder only has static methods; yet many return an OptionBuilder 
> instance
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-224?filter=allopenissues
>   Unable to properly require options
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-230?filter=allopenissues
>   OptionValidator Implementation Does Not Agree With JavaDoc
>   affectsVersions:1.2
>   
> https://issues.apache.org/jira/projects/CLI/issues/CLI-241?filter=allopenissues
>   3  org.apache.logging.log4j log4j-core 
> (sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/hadoop-input-format/build.gradle,sdks/java/io/hadoop-format/build.gradle)
>   version: 2.6.2
>   Jira issues:
>   Custom plugins are not loaded; URL protocol vfs is not supported
>   affectsVersions:2.5;2.6.2
>   
> https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1320?filter=allopenissues
>   [OSGi] Missing import package
>   affectsVersions:2.6.2
>   
> https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1467?filter=allopenissues
>   CronTriggeringPolicy raise exception and fail to rollover log file when 
> evaluateOnStartup is true.
>   affectsVersions:2.6.2
>   
> https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1474?filter=allopenissues
>   Improper header in CsvParameterLayout
>   affectsVersions:2.6.2
>   
> https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1482?filter=allopenissues
>   Merging configurations fail with an NPE when comparing Nodes with 
> different attributes
>   affectsVersions:2.6.2
>   
> https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1500?filter=allopenissues
>   CsvParameterLayout and CsvLogEventLayout insert NUL character if data 
> starts with {; (; [ or "
>   a

[jira] [Created] (BEAM-6383) Your project apache/beam is using buggy third-party libraries [WARNING]

2019-01-07 Thread Kaifeng Huang (JIRA)
Kaifeng Huang created BEAM-6383:
---

 Summary: Your project apache/beam is using buggy third-party 
libraries [WARNING]
 Key: BEAM-6383
 URL: https://issues.apache.org/jira/browse/BEAM-6383
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Kaifeng Huang
Assignee: Luke Cwik



Hi, there!
We are a research team working on third-party library analysis. We have found 
that some widely-used third-party libraries in your project have major/critical 
bugs, which will degrade the quality of your project. We highly recommend you 
to update those libraries to new versions.
We have attached the buggy third-party libraries and corresponding jira issue 
links below for you to have more detailed information.
1  org.apache.httpcomponents httpclient 
(sdks/java/io/elasticsearch/build.gradle,sdks/java/io/solr/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/amazon-web-services/build.gradle)
version: 4.5.6

Jira issues:
Support relatively new HTTP 308 redirect - RFC7538
affectsVersions:3.1 (end of life),4.5.6

https://issues.apache.org/jira/projects/HTTPCLIENT/issues/HTTPCLIENT-1946?filter=allopenissues




2  commons-cli commons-cli (release/build.gradle)
version: 1.2

Jira issues:
Unable to select a pure long option in a group
affectsVersions:1.0;1.1;1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-182?filter=allopenissues
Clear the selection from the groups before parsing
affectsVersions:1.0;1.1;1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-183?filter=allopenissues
Commons CLI incorrectly stripping leading and trailing quotes
affectsVersions:1.1;1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-185?filter=allopenissues
Coding error: OptionGroup.setSelected causes 
java.lang.NullPointerException
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-191?filter=allopenissues
StringIndexOutOfBoundsException in HelpFormatter.findWrapPos
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-193?filter=allopenissues
HelpFormatter strips leading whitespaces in the footer
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-207?filter=allopenissues
OptionBuilder only has static methods; yet many return an OptionBuilder 
instance
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-224?filter=allopenissues
Unable to properly require options
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-230?filter=allopenissues
OptionValidator Implementation Does Not Agree With JavaDoc
affectsVersions:1.2

https://issues.apache.org/jira/projects/CLI/issues/CLI-241?filter=allopenissues




3  org.apache.logging.log4j log4j-core 
(sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/hadoop-input-format/build.gradle,sdks/java/io/hadoop-format/build.gradle)
version: 2.6.2

Jira issues:
Custom plugins are not loaded; URL protocol vfs is not supported
affectsVersions:2.5;2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1320?filter=allopenissues
[OSGi] Missing import package
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1467?filter=allopenissues
CronTriggeringPolicy raise exception and fail to rollover log file when 
evaluateOnStartup is true.
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1474?filter=allopenissues
Improper header in CsvParameterLayout
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1482?filter=allopenissues
Merging configurations fail with an NPE when comparing Nodes with 
different attributes
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1500?filter=allopenissues
CsvParameterLayout and CsvLogEventLayout insert NUL character if data 
starts with {; (; [ or "
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1502?filter=allopenissues
Unregister JMX ignores log4j2.disable.jmx property
affectsVersions:2.6.2

https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1506?filter=allopenissues
DynamicThresholdFilter filters incorrectly when params

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182183
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885263
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java
 ##
 @@ -595,4 +614,163 @@ protected StackManipulation 
convertDefault(TypeDescriptor type) {
   return readValue;
 }
   }
+
+  /**
+   * Invokes a constructor registered using SchemaCreate. As constructor 
parameters might not be in
+   * the same order as the schema fields, reorders the parameters as necessary 
before calling the
+   * constructor.
+   */
+  static class ConstructorCreateInstruction extends 
InvokeUserCreateInstruction {
+private final Constructor constructor;
+
+ConstructorCreateInstruction(
+List fields, Class targetClass, Constructor 
constructor) {
+  super(fields, targetClass, 
Lists.newArrayList(constructor.getParameters()));
+  this.constructor = constructor;
+}
+
+@Override
+public InstrumentedType prepare(InstrumentedType instrumentedType) {
+  return instrumentedType;
+}
+
+@Override
+protected StackManipulation beforePushingParameters() {
+  // Create the target class.
+  ForLoadedType loadedType = new ForLoadedType(targetClass);
+  return new StackManipulation.Compound(TypeCreation.of(loadedType), 
Duplication.SINGLE);
+}
+
+@Override
+protected StackManipulation afterPushingParameters() {
+  return MethodInvocation.invoke(new ForLoadedConstructor(constructor));
+}
+  }
+
+  /**
+   * Invokes a static factory method registered using SchemaCreate. As the 
method parameters might
+   * not be in the same order as the schema fields, reorders the parameters as 
necessary before
+   * calling the constructor.
+   */
+  static class StaticFactoryMethodInstruction extends 
InvokeUserCreateInstruction {
+private final Method creator;
+
+StaticFactoryMethodInstruction(
+List fields, Class targetClass, Method 
creator) {
+  super(fields, targetClass, Lists.newArrayList(creator.getParameters()));
+  if (!Modifier.isStatic(creator.getModifiers())) {
+throw new IllegalArgumentException("Method " + creator + " is not 
static");
+  }
+  this.creator = creator;
+}
+
+@Override
+public InstrumentedType prepare(InstrumentedType instrumentedType) {
+  return instrumentedType;
+}
+
+@Override
+protected StackManipulation afterPushingParameters() {
+  return MethodInvocation.invoke(new ForLoadedMethod(creator));
+}
+  }
+
+  static class InvokeUserCreateInstruction implements Implementation {
+protected final List fields;
+protected final Class targetClass;
+protected final List parameters;
+protected final Map fieldMapping;
+
+protected InvokeUserCreateInstruction(
+List fields, Class targetClass, 
List parameters) {
+  this.fields = fields;
+  this.targetClass = targetClass;
+  this.parameters = parameters;
+
+  // Method parameters might not be in the same order as the schema 
fields, and the input
+  // array to SchemaUserTypeCreator.create is in schema order. Examine the 
parameter names
+  // and compare against field names to calculate the mapping between the 
two lists.
+  Map fieldsByLogicalName = Maps.newHashMap();
+  Map fieldsByJavaClassMember = Maps.newHashMap();
+  for (int i = 0; i < fields.size(); ++i) {
+// Method parameters are allowed to either correspond to the schema 
field names or to the
+// actual Java field or method names.
+FieldValueTypeInformation fieldValue = checkNotNull(fields.get(i));
+fieldsByLogicalName.put(fieldValue.getName(), i);
+if (fieldValue.getField() != null) {
+  fieldsByJavaClassMember.put(fieldValue.getField().getName(), i);
+} else if (fieldValue.getMethod() != null) {
+  String name = 
ReflectUtils.stripPrefix(fieldValue.getMethod().getName(), "set");
+  fieldsByJavaClassMember.put(name, i);
+}
+  }
+
+  fieldMapping = Maps.newHashMap();
+  for (int i = 0; i < parameters.size(); ++i) {
+Parameter parameter = parameters.get(i);
+String paramName = parameter.getName();
+Integer index = fieldsByLogicalName.get(paramName);
+if (index == null) {
+  index = fieldsByJavaClassMember.get(paramName);
+}
+if (index == null) {
+  throw new RuntimeException(
+  

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182181
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885244
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java
 ##
 @@ -101,6 +104,249 @@ public int hashCode() {
   public static final Schema NESTED_NULLABLE_SCHEMA =
   Schema.builder().addNullableField("nested", 
FieldType.row(NULLABLES_SCHEMA)).build();
 
+  /** A POJO for testing static factory methods. */
+  @DefaultSchema(JavaFieldSchema.class)
+  public static class StaticCreationSimplePojo {
+public final String str;
+public final byte aByte;
+public final short aShort;
+public final int anInt;
+public final long aLong;
+public final boolean aBoolean;
+public final DateTime dateTime;
+public final Instant instant;
+public final byte[] bytes;
+public final ByteBuffer byteBuffer;
+public final BigDecimal bigDecimal;
+public final StringBuilder stringBuilder;
+
+private StaticCreationSimplePojo(
+String str,
+byte aByte,
+short aShort,
+int anInt,
+long aLong,
+boolean aBoolean,
+DateTime dateTime,
+Instant instant,
+byte[] bytes,
+ByteBuffer byteBuffer,
+BigDecimal bigDecimal,
+StringBuilder stringBuilder) {
+  this.str = str;
+  this.aByte = aByte;
+  this.aShort = aShort;
+  this.anInt = anInt;
+  this.aLong = aLong;
+  this.aBoolean = aBoolean;
+  this.dateTime = dateTime;
+  this.instant = instant;
+  this.bytes = bytes;
+  this.byteBuffer = byteBuffer;
+  this.bigDecimal = bigDecimal;
+  this.stringBuilder = stringBuilder;
+}
+
+@SchemaCreate
+public static StaticCreationSimplePojo of(
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182181)
Time Spent: 5h  (was: 4h 50m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182180
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885227
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ReflectUtils.java
 ##
 @@ -77,6 +82,38 @@ public int hashCode() {
 });
   }
 
+  @Nullable
+  public static Constructor getAnnotatedConstructor(Class clazz) {
+return Arrays.stream(clazz.getDeclaredConstructors())
+.filter(m -> Modifier.isPublic(m.getModifiers()))
+.filter(m -> m.getAnnotation(SchemaCreate.class) != null)
+.findFirst()
+.orElse(null);
+  }
+
+  @Nullable
+  public static Method getAnnotatedCreateMethod(Class clazz) {
+return ANNOTATED_CONSTRUCTORS.computeIfAbsent(
+clazz,
+c -> {
+  Method method =
+  Arrays.stream(clazz.getDeclaredMethods())
+  .filter(m -> Modifier.isPublic(m.getModifiers()))
+  .filter(m -> Modifier.isStatic(m.getModifiers()))
+  .filter(m -> m.getAnnotation(SchemaCreate.class) != null)
+  .findFirst()
+  .orElse(null);
+  if (method != null && 
!clazz.isAssignableFrom(method.getReturnType())) {
 
 Review comment:
   same as above. it's a bit weird to have two factory methods, but is there a 
reason to disallow it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182180)
Time Spent: 4h 50m  (was: 4h 40m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182179
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885222
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/JavaBeanSchema.java
 ##
 @@ -47,70 +51,103 @@
   /** {@link FieldValueTypeSupplier} that's based on getter methods. */
   @VisibleForTesting
   public static class GetterTypeSupplier implements FieldValueTypeSupplier {
+public static final GetterTypeSupplier INSTANCE = new GetterTypeSupplier();
+
 @Override
-public List get(Class clazz, Schema schema) {
-  Map types =
+public List get(Class clazz, @Nullable 
Schema schema) {
+  List types =
   ReflectUtils.getMethods(clazz)
   .stream()
   .filter(ReflectUtils::isGetter)
+  .filter(m -> !m.isAnnotationPresent(SchemaIgnore.class))
   .map(FieldValueTypeInformation::forGetter)
-  .collect(Collectors.toMap(FieldValueTypeInformation::getName, 
Function.identity()));
-  // Return the list ordered by the schema fields.
-  return schema
-  .getFields()
-  .stream()
-  .map(f -> types.get(f.getName()))
-  .collect(Collectors.toList());
+  .map(
+  t -> {
+FieldName fieldName = 
t.getMethod().getAnnotation(FieldName.class);
+return (fieldName != null) ? t.withName(fieldName.value()) 
: t;
+  })
+  .collect(Collectors.toList());
+  return (schema != null) ? StaticSchemaInference.sortBySchema(types, 
schema) : types;
 }
   }
 
   /** {@link FieldValueTypeSupplier} that's based on setter methods. */
   @VisibleForTesting
   public static class SetterTypeSupplier implements FieldValueTypeSupplier {
+private static final SetterTypeSupplier INSTANCE = new 
SetterTypeSupplier();
+
 @Override
 public List get(Class clazz, Schema schema) {
-  Map types =
+  List types =
   ReflectUtils.getMethods(clazz)
   .stream()
   .filter(ReflectUtils::isSetter)
+  .filter(m -> !m.isAnnotationPresent(SchemaIgnore.class))
   .map(FieldValueTypeInformation::forSetter)
-  .collect(Collectors.toMap(FieldValueTypeInformation::getName, 
Function.identity()));
-  // Return the list ordered by the schema fields.
-  return schema
-  .getFields()
-  .stream()
-  .map(f -> types.get(f.getName()))
-  .collect(Collectors.toList());
+  .collect(Collectors.toList());
+
+  return (schema != null) ? StaticSchemaInference.sortBySchema(types, 
schema) : types;
 }
   }
 
   @Override
   public  Schema schemaFor(TypeDescriptor typeDescriptor) {
-return JavaBeanUtils.schemaFromJavaBeanClass(typeDescriptor.getRawType());
+Schema schema =
+JavaBeanUtils.schemaFromJavaBeanClass(
+typeDescriptor.getRawType(), GetterTypeSupplier.INSTANCE);
+
+// If there are no creator methods, then validate that we have setters for 
every field.
+// Otherwise, we will have not way of creating the class.
+if (ReflectUtils.getAnnotatedCreateMethod(typeDescriptor.getRawType()) == 
null
+&& ReflectUtils.getAnnotatedConstructor(typeDescriptor.getRawType()) 
== null) {
+  JavaBeanUtils.validateJavaBean(
+  GetterTypeSupplier.INSTANCE.get(typeDescriptor.getRawType(), schema),
+  SetterTypeSupplier.INSTANCE.get(typeDescriptor.getRawType(), 
schema));
+}
+return schema;
   }
 
   @Override
   public FieldValueGetterFactory fieldValueGetterFactory() {
 return (Class targetClass, Schema schema) ->
-JavaBeanUtils.getGetters(targetClass, schema, new 
GetterTypeSupplier());
+JavaBeanUtils.getGetters(targetClass, schema, 
GetterTypeSupplier.INSTANCE);
   }
 
   @Override
   UserTypeCreatorFactory schemaTypeCreatorFactory() {
-return new SetterBasedCreatorFactory(new JavaBeanSetterFactory());
+UserTypeCreatorFactory setterBasedFactory =
+new SetterBasedCreatorFactory(new JavaBeanSetterFactory());
+
+return (Class targetClass, Schema schema) -> {
+  // If a static method is marked with @SchemaCreate, use that.
+  Method annotated = ReflectUtils.getAnnotatedCreateMethod(targetClass);
+  if (annotated != null) {
+return JavaBeanUtils.getStaticCreator(
+targetClass, annotated, schema, GetterTypeSupplier.INSTANCE);
+  }
+
+  // 

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182182
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885252
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java
 ##
 @@ -101,6 +104,249 @@ public int hashCode() {
   public static final Schema NESTED_NULLABLE_SCHEMA =
   Schema.builder().addNullableField("nested", 
FieldType.row(NULLABLES_SCHEMA)).build();
 
+  /** A POJO for testing static factory methods. */
+  @DefaultSchema(JavaFieldSchema.class)
+  public static class StaticCreationSimplePojo {
+public final String str;
+public final byte aByte;
+public final short aShort;
+public final int anInt;
+public final long aLong;
+public final boolean aBoolean;
+public final DateTime dateTime;
+public final Instant instant;
+public final byte[] bytes;
+public final ByteBuffer byteBuffer;
+public final BigDecimal bigDecimal;
+public final StringBuilder stringBuilder;
+
+private StaticCreationSimplePojo(
+String str,
+byte aByte,
+short aShort,
+int anInt,
+long aLong,
+boolean aBoolean,
+DateTime dateTime,
+Instant instant,
+byte[] bytes,
+ByteBuffer byteBuffer,
+BigDecimal bigDecimal,
+StringBuilder stringBuilder) {
+  this.str = str;
+  this.aByte = aByte;
+  this.aShort = aShort;
+  this.anInt = anInt;
+  this.aLong = aLong;
+  this.aBoolean = aBoolean;
+  this.dateTime = dateTime;
+  this.instant = instant;
+  this.bytes = bytes;
+  this.byteBuffer = byteBuffer;
+  this.bigDecimal = bigDecimal;
+  this.stringBuilder = stringBuilder;
+}
+
+@SchemaCreate
+public static StaticCreationSimplePojo of(
+String str,
+byte aByte,
+short aShort,
+int anInt,
+long aLong,
+boolean aBoolean,
+DateTime dateTime,
+Instant instant,
+byte[] bytes,
+ByteBuffer byteBuffer,
+BigDecimal bigDecimal,
+StringBuilder stringBuilder) {
+  return new StaticCreationSimplePojo(
+  str,
+  aByte,
+  aShort,
+  anInt,
+  aLong,
+  aBoolean,
+  dateTime,
+  instant,
+  bytes,
+  byteBuffer,
+  bigDecimal,
+  stringBuilder);
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (!(o instanceof StaticCreationSimplePojo)) {
+return false;
+  }
+  StaticCreationSimplePojo that = (StaticCreationSimplePojo) o;
+  return aByte == that.aByte
+  && aShort == that.aShort
+  && anInt == that.anInt
+  && aLong == that.aLong
+  && aBoolean == that.aBoolean
+  && Objects.equals(str, that.str)
+  && Objects.equals(dateTime, that.dateTime)
+  && Objects.equals(instant, that.instant)
+  && Arrays.equals(bytes, that.bytes)
+  && Objects.equals(byteBuffer, that.byteBuffer)
+  && Objects.equals(bigDecimal, that.bigDecimal)
+  && Objects.equals(stringBuilder.toString(), 
that.stringBuilder.toString());
+}
+
+@Override
+public int hashCode() {
+  int result =
+  Objects.hash(
+  str,
+  aByte,
+  aShort,
+  anInt,
+  aLong,
+  aBoolean,
+  dateTime,
+  instant,
+  byteBuffer,
+  bigDecimal,
+  stringBuilder);
+  result = 31 * result + Arrays.hashCode(bytes);
+  return result;
+}
+  }
+
+  /** A POJO for testing annotations. */
+  @DefaultSchema(JavaFieldSchema.class)
+  public static class AnnotatedSimplePojo {
+public final String str;
+
+@FieldName("aByte")
+public final byte theByte;
+
+@FieldName("aShort")
+public final short theShort;
+
+public final int anInt;
+public final long aLong;
+public final boolean aBoolean;
+public final DateTime dateTime;
+public final Instant instant;
+public final byte[] bytes;
+public final ByteBuffer byteBuffer;
+public final BigDecimal bigDecimal;
+public final StringBuilder stringBuilder;
+@SchemaIgnore public final Integer pleaseIgnore;
+
+// Marked with SchemaCreate, so this will be called to construct instances.
+@SchemaCreate
+public AnnotatedSimplePo

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182184
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 08/Jan/19 06:02
Start Date: 08/Jan/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245885269
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/FieldValueTypeSupplier.java
 ##
 @@ -28,8 +29,10 @@
  */
 public interface FieldValueTypeSupplier extends Serializable {
   /**
-   * Return all the FieldValueTypeInformations. The returned list must be in 
the same order as
+   * Return all the FieldValueTypeInformations.
+   *
+   * If the schema parameter is not null, then the returned list must be in 
the same order as
* fields in the schema.
*/
-  List get(Class clazz, Schema schema);
+  List get(Class clazz, @Nullable Schema schema);
 
 Review comment:
   Since every single call site except for one needed this, moving the sorting 
to all call sites seems undesirable. Instead I split into two separate methods, 
one taking Schema one not, so that we don't need a @Nullable parameter
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182184)
Time Spent: 5.5h  (was: 5h 20m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6349) Exceptions (IllegalArgumentException or NoClassDefFoundError) when running tests on Dataflow runner

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736767#comment-16736767
 ] 

Boyuan Zhang commented on BEAM-6349:


Here are some instructions about how to run non-portable pipeline with 
customized worker jar:
 # First, you need to build the worker jar with latest changes by using 
./gradlew :beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar
 # Launch your pipeline with additional options: 

          --dataflowWorkerJar=${the absolute path to the built jar}(usually, 
it's under the legacy-worker built dir)

         --workerHarnessContainerImage=

> Exceptions (IllegalArgumentException or NoClassDefFoundError) when running 
> tests on Dataflow runner
> ---
>
> Key: BEAM-6349
> URL: https://issues.apache.org/jira/browse/BEAM-6349
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Craig Chambers
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Running GroupByKeyLoadTest results in the following error on Dataflow runner:
>  
> {code:java}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.(CloudObjects.java:39)
>   ... 15 more
> {code}
>  
> Example command to run the tests (FWIW, it also runs the  "clean" task 
> although I don't know if it's necessary):
> {code:java}
> ./gradlew clean :beam-sdks-java-load-tests:run --info 
> -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest 
> -Prunner=:beam-runners-google-cloud-dataflow-java 
> '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}
>  
> --stepOptions={"outputRe

[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182139
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 08/Jan/19 02:43
Start Date: 08/Jan/19 02:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7397: [BEAM-6347] Add 
website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#issuecomment-452155511
 
 
   Thanks Melissa. This looks great.
   
   LGTM other than few comments.
   
   Please feel free to merge after addressing these comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182139)
Time Spent: 3h  (was: 2h 50m)

> Add page for developing I/O connectors for Java
> ---
>
> Key: BEAM-6347
> URL: https://issues.apache.org/jira/browse/BEAM-6347
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182143
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 08/Jan/19 02:43
Start Date: 08/Jan/19 02:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7397: 
[BEAM-6347] Add website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245861503
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,372 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case. Before you
+start, read the
+[new I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for an overview of developing a new I/O connector, the available implementation
+options, and how to choose the right option for your use case.
+
+This guide covers using the `Source` and `FileBasedSink` interfaces using Java.
+The Python SDK offers the same functionality, but uses a slightly different 
API.
+See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Basic code requirements {#basic-code-reqs}
+
+Beam runners use the classes you provide to read and/or write data using
+multiple worker instances in parallel. As such, the code you provide for
+`Source` and `FileBasedSink` subclasses must meet some basic requirements:
+
+  1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether
+ bounded or unbounded, must be Serializable. A runner might create multiple
+ instances of your `Source` or `FileBasedSink` subclass to be sent to
+ multiple remote workers to facilitate reading or writing in parallel.  
+
+  1. **Immutability:**
+ Your `Source` or `FileBasedSink` subclass must be effectively immutable.
+ All private fields must be declared final, and all private variables of
+ collection type must be effectively immutable. If your class has setter
+ methods, those methods must return an independent copy of the object with
+ the relevant field modified.  
+
+ You should only use mutable state in your `Source` or `FileBasedSink`
+ subclass if you are using lazy evaluation of expensive computations that
+ you need to implement the source or sink; in that case, you must declare
+ all mutable instance variables transient.  
+
+  1. **Thread-Safety:** Your code must be thread-safe. If you build your source
+ to work with dynamic work rebalancing, it is critical that you make your
+ code thread-safe. The Beam SDK provides a helper class to make this 
easier.
+ See [Using Your BoundedSource with dynamic work 
rebalancing](#bounded-dynamic)
+ for more details.  
+
+  1. **Testability:** It is critical to exhaustively unit test all of your
+ `Source` and `FileBasedSink` subclasses, especially if you build your
+ classes to work with advanced features such as dynamic work rebalancing. A
+ minor implementation error can lead to data corruption or data loss (such
+ as skipping or duplicating records) that can be hard to detect.  
+
+ To assist in testing `BoundedSource` implementations, you can use the
+ SourceTestUtils class. `SourceTestUtils` contains utilities for 
automatically
+ verifying some of the properties of your `BoundedSource` implementation. 
You
+ can use `SourceTestUtils` to increase your implementation's test coverage
+ using a wide range of inputs with relatively few lines of code. For
+ examples that use `SourceTestUtils`, see the
+ 
[AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java)
 and
+ 
[TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java)
+ source code.
+
+In addition, see the [PTransform style guide]({{ site.baseurl 
}}/contribute/ptransform-style-guide/)
+for Beam's transform style guidance.
+
+## Implementing the Source interface
+
+To create a data source for your pipeline, you must provide the format-specific
+logic that tells a runner how to read data from your input source, and how to
+split your data source into m

[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182141
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 08/Jan/19 02:43
Start Date: 08/Jan/19 02:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7397: 
[BEAM-6347] Add website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245861647
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,372 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case. Before you
+start, read the
+[new I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for an overview of developing a new I/O connector, the available implementation
+options, and how to choose the right option for your use case.
+
+This guide covers using the `Source` and `FileBasedSink` interfaces using Java.
+The Python SDK offers the same functionality, but uses a slightly different 
API.
+See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Basic code requirements {#basic-code-reqs}
+
+Beam runners use the classes you provide to read and/or write data using
+multiple worker instances in parallel. As such, the code you provide for
+`Source` and `FileBasedSink` subclasses must meet some basic requirements:
+
+  1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether
+ bounded or unbounded, must be Serializable. A runner might create multiple
+ instances of your `Source` or `FileBasedSink` subclass to be sent to
+ multiple remote workers to facilitate reading or writing in parallel.  
+
+  1. **Immutability:**
+ Your `Source` or `FileBasedSink` subclass must be effectively immutable.
+ All private fields must be declared final, and all private variables of
+ collection type must be effectively immutable. If your class has setter
+ methods, those methods must return an independent copy of the object with
+ the relevant field modified.  
+
+ You should only use mutable state in your `Source` or `FileBasedSink`
+ subclass if you are using lazy evaluation of expensive computations that
+ you need to implement the source or sink; in that case, you must declare
+ all mutable instance variables transient.  
+
+  1. **Thread-Safety:** Your code must be thread-safe. If you build your source
+ to work with dynamic work rebalancing, it is critical that you make your
+ code thread-safe. The Beam SDK provides a helper class to make this 
easier.
+ See [Using Your BoundedSource with dynamic work 
rebalancing](#bounded-dynamic)
+ for more details.  
+
+  1. **Testability:** It is critical to exhaustively unit test all of your
+ `Source` and `FileBasedSink` subclasses, especially if you build your
+ classes to work with advanced features such as dynamic work rebalancing. A
+ minor implementation error can lead to data corruption or data loss (such
+ as skipping or duplicating records) that can be hard to detect.  
+
+ To assist in testing `BoundedSource` implementations, you can use the
+ SourceTestUtils class. `SourceTestUtils` contains utilities for 
automatically
+ verifying some of the properties of your `BoundedSource` implementation. 
You
+ can use `SourceTestUtils` to increase your implementation's test coverage
+ using a wide range of inputs with relatively few lines of code. For
+ examples that use `SourceTestUtils`, see the
+ 
[AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java)
 and
+ 
[TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java)
+ source code.
+
+In addition, see the [PTransform style guide]({{ site.baseurl 
}}/contribute/ptransform-style-guide/)
+for Beam's transform style guidance.
+
+## Implementing the Source interface
+
+To create a data source for your pipeline, you must provide the format-specific
+logic that tells a runner how to read data from your input source, and how to
+split your data source into m

[jira] [Commented] (BEAM-6349) Exceptions (IllegalArgumentException or NoClassDefFoundError) when running tests on Dataflow runner

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736648#comment-16736648
 ] 

Boyuan Zhang commented on BEAM-6349:


Hey Lukasz,

The root cause of this problem is because Craig's PR changed the sdk and 
dataflow runner harness at the same time but your test is running against the 
pre-build dataflow runner image(which does not include Craig's changes) with 
the latest sdk(which includes Craig's sdk part changes). This results in a 
mismatch issue. For a temporary workaround, you can launch your job with 
customized dataflow worker jar.

> Exceptions (IllegalArgumentException or NoClassDefFoundError) when running 
> tests on Dataflow runner
> ---
>
> Key: BEAM-6349
> URL: https://issues.apache.org/jira/browse/BEAM-6349
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Craig Chambers
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Running GroupByKeyLoadTest results in the following error on Dataflow runner:
>  
> {code:java}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.(CloudObjects.java:39)
>   ... 15 more
> {code}
>  
> Example command to run the tests (FWIW, it also runs the  "clean" task 
> although I don't know if it's necessary):
> {code:java}
> ./gradlew clean :beam-sdks-java-load-tests:run --info 
> -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest 
> -Prunner=:beam-runners-google-cloud-dataflow-java 
> '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}
>  
> --stepOptions={"outputRecordsPerInputRecord":1,"pre

[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182140
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 08/Jan/19 02:43
Start Date: 08/Jan/19 02:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7397: 
[BEAM-6347] Add website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245861334
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,372 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case. Before you
+start, read the
+[new I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for an overview of developing a new I/O connector, the available implementation
+options, and how to choose the right option for your use case.
+
+This guide covers using the `Source` and `FileBasedSink` interfaces using Java.
+The Python SDK offers the same functionality, but uses a slightly different 
API.
+See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Basic code requirements {#basic-code-reqs}
+
+Beam runners use the classes you provide to read and/or write data using
+multiple worker instances in parallel. As such, the code you provide for
+`Source` and `FileBasedSink` subclasses must meet some basic requirements:
+
+  1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether
+ bounded or unbounded, must be Serializable. A runner might create multiple
+ instances of your `Source` or `FileBasedSink` subclass to be sent to
+ multiple remote workers to facilitate reading or writing in parallel.  
+
+  1. **Immutability:**
+ Your `Source` or `FileBasedSink` subclass must be effectively immutable.
+ All private fields must be declared final, and all private variables of
+ collection type must be effectively immutable. If your class has setter
+ methods, those methods must return an independent copy of the object with
+ the relevant field modified.  
+
+ You should only use mutable state in your `Source` or `FileBasedSink`
+ subclass if you are using lazy evaluation of expensive computations that
+ you need to implement the source or sink; in that case, you must declare
+ all mutable instance variables transient.  
+
+  1. **Thread-Safety:** Your code must be thread-safe. If you build your source
+ to work with dynamic work rebalancing, it is critical that you make your
+ code thread-safe. The Beam SDK provides a helper class to make this 
easier.
+ See [Using Your BoundedSource with dynamic work 
rebalancing](#bounded-dynamic)
+ for more details.  
+
+  1. **Testability:** It is critical to exhaustively unit test all of your
+ `Source` and `FileBasedSink` subclasses, especially if you build your
+ classes to work with advanced features such as dynamic work rebalancing. A
+ minor implementation error can lead to data corruption or data loss (such
+ as skipping or duplicating records) that can be hard to detect.  
+
+ To assist in testing `BoundedSource` implementations, you can use the
+ SourceTestUtils class. `SourceTestUtils` contains utilities for 
automatically
+ verifying some of the properties of your `BoundedSource` implementation. 
You
+ can use `SourceTestUtils` to increase your implementation's test coverage
+ using a wide range of inputs with relatively few lines of code. For
+ examples that use `SourceTestUtils`, see the
+ 
[AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java)
 and
+ 
[TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java)
+ source code.
+
+In addition, see the [PTransform style guide]({{ site.baseurl 
}}/contribute/ptransform-style-guide/)
+for Beam's transform style guidance.
+
+## Implementing the Source interface
+
+To create a data source for your pipeline, you must provide the format-specific
+logic that tells a runner how to read data from your input source, and how to
+split your data source into m

[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182142
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 08/Jan/19 02:43
Start Date: 08/Jan/19 02:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7397: 
[BEAM-6347] Add website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245862633
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,372 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case. Before you
+start, read the
+[new I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for an overview of developing a new I/O connector, the available implementation
+options, and how to choose the right option for your use case.
+
+This guide covers using the `Source` and `FileBasedSink` interfaces using Java.
+The Python SDK offers the same functionality, but uses a slightly different 
API.
+See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Basic code requirements {#basic-code-reqs}
+
+Beam runners use the classes you provide to read and/or write data using
+multiple worker instances in parallel. As such, the code you provide for
+`Source` and `FileBasedSink` subclasses must meet some basic requirements:
+
+  1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether
+ bounded or unbounded, must be Serializable. A runner might create multiple
+ instances of your `Source` or `FileBasedSink` subclass to be sent to
+ multiple remote workers to facilitate reading or writing in parallel.  
+
+  1. **Immutability:**
+ Your `Source` or `FileBasedSink` subclass must be effectively immutable.
+ All private fields must be declared final, and all private variables of
+ collection type must be effectively immutable. If your class has setter
+ methods, those methods must return an independent copy of the object with
+ the relevant field modified.  
+
+ You should only use mutable state in your `Source` or `FileBasedSink`
+ subclass if you are using lazy evaluation of expensive computations that
+ you need to implement the source or sink; in that case, you must declare
+ all mutable instance variables transient.  
+
+  1. **Thread-Safety:** Your code must be thread-safe. If you build your source
+ to work with dynamic work rebalancing, it is critical that you make your
+ code thread-safe. The Beam SDK provides a helper class to make this 
easier.
+ See [Using Your BoundedSource with dynamic work 
rebalancing](#bounded-dynamic)
+ for more details.  
+
+  1. **Testability:** It is critical to exhaustively unit test all of your
+ `Source` and `FileBasedSink` subclasses, especially if you build your
+ classes to work with advanced features such as dynamic work rebalancing. A
+ minor implementation error can lead to data corruption or data loss (such
+ as skipping or duplicating records) that can be hard to detect.  
+
+ To assist in testing `BoundedSource` implementations, you can use the
+ SourceTestUtils class. `SourceTestUtils` contains utilities for 
automatically
+ verifying some of the properties of your `BoundedSource` implementation. 
You
+ can use `SourceTestUtils` to increase your implementation's test coverage
+ using a wide range of inputs with relatively few lines of code. For
+ examples that use `SourceTestUtils`, see the
+ 
[AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java)
 and
+ 
[TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java)
+ source code.
+
+In addition, see the [PTransform style guide]({{ site.baseurl 
}}/contribute/ptransform-style-guide/)
+for Beam's transform style guidance.
+
+## Implementing the Source interface
+
+To create a data source for your pipeline, you must provide the format-specific
+logic that tells a runner how to read data from your input source, and how to
+split your data source into m

[jira] [Created] (BEAM-6382) SamzaRunner: add an option to read configs using a user-defined factory

2019-01-07 Thread Xinyu Liu (JIRA)
Xinyu Liu created BEAM-6382:
---

 Summary: SamzaRunner: add an option to read configs using a 
user-defined factory
 Key: BEAM-6382
 URL: https://issues.apache.org/jira/browse/BEAM-6382
 Project: Beam
  Issue Type: Improvement
  Components: runner-samza
Reporter: Xinyu Liu
Assignee: Xinyu Liu


We need an option to read configs from a factory which is useful in Yarn as 
well as user-defined file format. By default this config factory is to read 
property file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4777) Python PortableRunner should support metrics

2019-01-07 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4777:
--

Assignee: Ryan Williams  (was: Ankur Goenka)

> Python PortableRunner should support metrics
> 
>
> Key: BEAM-4777
> URL: https://issues.apache.org/jira/browse/BEAM-4777
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making Python PortableRunner understand them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4776) Java PortableRunner should support metrics

2019-01-07 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4776:
--

Assignee: Ryan Williams  (was: Ankur Goenka)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4775) JobService should support returning metrics

2019-01-07 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4775:
--

Assignee: Ryan Williams  (was: Ankur Goenka)

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>
> [https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto]
>  currently doesn't appear to have a way for JobService to return metrics to a 
> user, even though 
> [https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/beam_fn_api.proto]
>  includes support for reporting SDK metrics to the runner harness.
>  
> Metrics are apparently necessary to run any ValidatesRunner tests because 
> PAssert needs to validate that the assertions succeeded. However, this 
> statement should be double-checked: perhaps it's possible to somehow work 
> with PAssert without metrics support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6362) reduce logging volume on Jenkins pre-commit and post-commit jobs

2019-01-07 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-6362:
---

Assignee: Udi Meiri  (was: Luke Cwik)

> reduce logging volume on Jenkins pre-commit and post-commit jobs
> 
>
> Key: BEAM-6362
> URL: https://issues.apache.org/jira/browse/BEAM-6362
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Current idea is to remove the --info flags in Jenkins invocations of gradlew.
> This should significantly reduce log size, while still logging INFO and above 
> for failed tests.
> Discussion:
> https://lists.apache.org/thread.html/ad7cb431af7657522f844a815445ee770fcc7cfd70a2a9ffcd02ec76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6362) reduce logging volume on Jenkins pre-commit and post-commit jobs

2019-01-07 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-6362.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> reduce logging volume on Jenkins pre-commit and post-commit jobs
> 
>
> Key: BEAM-6362
> URL: https://issues.apache.org/jira/browse/BEAM-6362
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Udi Meiri
>Assignee: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Current idea is to remove the --info flags in Jenkins invocations of gradlew.
> This should significantly reduce log size, while still logging INFO and above 
> for failed tests.
> Discussion:
> https://lists.apache.org/thread.html/ad7cb431af7657522f844a815445ee770fcc7cfd70a2a9ffcd02ec76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6357) Datastore Python client: retry on ABORTED rpc responses

2019-01-07 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-6357:

Summary: Datastore Python client: retry on ABORTED rpc responses  (was: 
Datastore clients: retry on ABORTED rpc responses)

> Datastore Python client: retry on ABORTED rpc responses
> ---
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses

2019-01-07 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-6357.
-
   Resolution: Fixed
Fix Version/s: 2.11.0

> Datastore clients: retry on ABORTED rpc responses
> -
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182100
 ]

ASF GitHub Bot logged work on BEAM-6357:


Author: ASF GitHub Bot
Created on: 08/Jan/19 01:00
Start Date: 08/Jan/19 01:00
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #7406: [BEAM-6357] 
Retry Datastore writes on ABORTED
URL: https://github.com/apache/beam/pull/7406#issuecomment-452137183
 
 
   Thanks, this LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182100)
Time Spent: 1h  (was: 50m)

> Datastore clients: retry on ABORTED rpc responses
> -
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError

2019-01-07 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736543#comment-16736543
 ] 

Daniel Oliveira commented on BEAM-6355:
---

Yes, I think you're right. Following the comments in that doc I was led to 
BEAM-6349 which looks nearly identical to this issue.

> [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] 
> ExceptionInInitializerError
> --
>
> Key: BEAM-6355
> URL: https://issues.apache.org/jira/browse/BEAM-6355
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, test-failures
>Reporter: Scott Wegner
>Assignee: Boyuan Zhang
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/]
>  * [Dataflow 
> Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589]
>  * [Test source 
> code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy]
> Initial investigation:
> It appears the Dataflow job failed during worker initialization. From the 
> [stackdriver 
> logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z],
>  I see:
> {code}
> 019-01-03 03:08:27.770 PST
> Uncaught exception occurred during work unit execution. This will be retried.
> Expand all | Collapse all {
>  insertId:  "3832125194122580497:879173:0:62501"  
>  jsonPayload: {
>   exception:  "java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.popu

[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182101
 ]

ASF GitHub Bot logged work on BEAM-6357:


Author: ASF GitHub Bot
Created on: 08/Jan/19 01:00
Start Date: 08/Jan/19 01:00
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on pull request #7406: 
[BEAM-6357] Retry Datastore writes on ABORTED
URL: https://github.com/apache/beam/pull/7406
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182101)
Time Spent: 1h 10m  (was: 1h)

> Datastore clients: retry on ABORTED rpc responses
> -
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6280?focusedWorklogId=182087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182087
 ]

ASF GitHub Bot logged work on BEAM-6280:


Author: ASF GitHub Bot
Created on: 08/Jan/19 00:41
Start Date: 08/Jan/19 00:41
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #7433: 
[BEAM-6280] Refactors Python portability tests to be multi-threaded aware
URL: https://github.com/apache/beam/pull/7433
 
 
   The local_job_service.py module was too complex and had a number of 
multi-threaded issues. One of the issue was that the Get(Message|State)Stream 
methods did not guarantee the consumer would see all message/state transitions. 
Another issue was a data race between messages and state transitions. In the 
test the PortableRunner would try and wait for any last messages. However, 
there was a race between receiving the message and the PortableRunner waiting 
for a terminal state. What might have happened was a job transition into a 
terminal state and the consumer would not receive the message, failing the test.
   
   Note: I wasn't able to repro the bug on my local machine but did my best to 
zero in on any multi-threading issues. There may be follow-up PRs to fix the 
issue if it pops up again.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po

[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182094
 ]

ASF GitHub Bot logged work on BEAM-6357:


Author: ASF GitHub Bot
Created on: 08/Jan/19 00:46
Start Date: 08/Jan/19 00:46
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7406: [BEAM-6357] Retry 
Datastore writes on ABORTED
URL: https://github.com/apache/beam/pull/7406#issuecomment-452134721
 
 
   merge plz: @charlesccychen 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182094)
Time Spent: 50m  (was: 40m)

> Datastore clients: retry on ABORTED rpc responses
> -
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6280?focusedWorklogId=182091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182091
 ]

ASF GitHub Bot logged work on BEAM-6280:


Author: ASF GitHub Bot
Created on: 08/Jan/19 00:43
Start Date: 08/Jan/19 00:43
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #7433: [BEAM-6280] 
Refactors Python portability tests to be multi-threaded aware
URL: https://github.com/apache/beam/pull/7433#issuecomment-452133990
 
 
   Gradle scans:
   
   [:beam-sdks-python:testPython2](https://scans.gradle.com/s/rdsusz6grxxsi/)
   [:beam-sdks-python:testPython3](https://scans.gradle.com/s/bcl7pvsnuejby)
   [:beam-sdks-python:preCommit](https://scans.gradle.com/s/iwkorwpzwwrfs)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182091)
Time Spent: 20m  (was: 10m)

> Failure in PortableRunnerTest.test_error_traceback_includes_user_code
> -
>
> Key: BEAM-6280
> URL: https://issues.apache.org/jira/browse/BEAM-6280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Assignee: Sam Rohde
>Priority: Critical
>  Labels: flaky-test
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/]
> [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3]
> {code:java}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 466, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py",
>  line 425, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 314, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start 
>  coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182090
 ]

ASF GitHub Bot logged work on BEAM-6357:


Author: ASF GitHub Bot
Created on: 08/Jan/19 00:42
Start Date: 08/Jan/19 00:42
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7406: [BEAM-6357] Retry 
Datastore writes on ABORTED
URL: https://github.com/apache/beam/pull/7406#issuecomment-452133947
 
 
   R: @tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182090)
Time Spent: 40m  (was: 0.5h)

> Datastore clients: retry on ABORTED rpc responses
> -
>
> Key: BEAM-6357
> URL: https://issues.apache.org/jira/browse/BEAM-6357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It is recommended to retry on ABORTED error codes:
> https://cloud.google.com/datastore/docs/concepts/errors#error_codes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6352) Watch PTransform is broken

2019-01-07 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-6352:

Affects Version/s: 2.9.0

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-07 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736519#comment-16736519
 ] 

Udi Meiri commented on BEAM-6352:
-

I can recreate this in 2.9.0, using the Java quickstart instructions and 
modifying ./src/main/java/org/apache/beam/examples/WordCount.java:

```
import org.apache.beam.sdk.transforms.Watch;
import org.joda.time.Duration;
import org.apache.beam.sdk.io.FileIO;
```

```
  static void runWordCount(WordCountOptions options) {
Pipeline.create(options)
.apply(
FileIO.match()
.filepattern(options.getInputFile())
.continuously(Duration.standardMinutes(1), 
Watch.Growth.never()));
  }
```

```
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount 
 -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
...
java.lang.IllegalArgumentException: 
org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
process(ProcessContext, GrowthTracker): Has tracker type 
Watch.GrowthTracker, but the DoFn's tracker 
type must be of type RestrictionTracker.
at 
org.apache.beam.sdk.transforms.reflect.DoFnSignatures$ErrorReporter.throwIllegalArgument
 (DoFnSignatures.java:1507)
at 
org.apache.beam.sdk.transforms.reflect.DoFnSignatures$ErrorReporter.checkArgument
 (DoFnSignatures.java:1512)
at 
org.apache.beam.sdk.transforms.reflect.DoFnSignatures.verifySplittableMethods 
(DoFnSignatures.java:593)
at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.parseSignature 
(DoFnSignatures.java:472)
at 
org.apache.beam.sdk.transforms.reflect.DoFnSignatures.lambda$getSignature$0 
(DoFnSignatures.java:140)
at java.util.HashMap.computeIfAbsent (HashMap.java:1128)
at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getSignature 
(DoFnSignatures.java:140)
at org.apache.beam.sdk.transforms.ParDo.validate (ParDo.java:546)
at org.apache.beam.sdk.transforms.ParDo.of (ParDo.java:393)
at org.apache.beam.sdk.transforms.Watch$Growth.expand (Watch.java:689)
at org.apache.beam.sdk.transforms.Watch$Growth.expand (Watch.java:157)
at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:488)
at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:370)
at org.apache.beam.sdk.io.FileIO$MatchAll.expand (FileIO.java:614)
at org.apache.beam.sdk.io.FileIO$MatchAll.expand (FileIO.java:572)
at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:488)
at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:370)
at org.apache.beam.sdk.io.FileIO$Match.expand (FileIO.java:567)
at org.apache.beam.sdk.io.FileIO$Match.expand (FileIO.java:514)
at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:471)
at org.apache.beam.sdk.values.PBegin.apply (PBegin.java:44)
at org.apache.beam.sdk.Pipeline.apply (Pipeline.java:167)
at org.apache.beam.examples.WordCount.runWordCount (WordCount.java:178)
at org.apache.beam.examples.WordCount.main (WordCount.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)
```

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleF

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182073
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:52
Start Date: 07/Jan/19 22:52
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245825968
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/FieldName.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.annotations;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import javax.annotation.CheckForNull;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+
+/**
+ * When used on a POJO field or a JavaBean getter, the specified name is used 
for the generated
+ * schema field.
+ *
+ * For example, a Java POJO with a field that we want in our schema but 
under a different name.
+ *
+ * 
+ *   {@literal @}DefaultSchema(JavaBeanSchema.class)
+ *   class MyClass {
+ * public String user;
+ * {@literal @}SchemaName("age")
 
 Review comment:
   Went with SchemaFieldName.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182073)
Time Spent: 4.5h  (was: 4h 20m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182072
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:51
Start Date: 07/Jan/19 22:51
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245825698
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/SchemaCreate.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.annotations;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+
+/**
+ * Can be put on a constructor or a static method, in which case that 
constructor or method will be
+ * used to created instance of the class by Beam's schema code.
+ *
+ * For example, the following Java POJO.
+ *
+ * 
+ *   {@literal @}DefaultSchema(JavaBeanSchema.class)
+ *  class MyClass {
+ *public final String user;
+ *public final int age;
+ *
+ *{@literal @}SchemaCreate
+ *public MyClass(String user, int age) {
+ *  this.user = user;
+ *  this.age = age;
+ *}
+ *  }
+ * 
+ *
+ * This tells Beam that this constructor can be used to construct 
instances. Beam will match up
+ * the names of the constructor arguments to schema fields in order to decide 
how to create the
+ * class from a Row.
+ *
+ * This can also be used to annotate a static factory method on the class. 
For example:
+ *
+ * 
+ * {@literal @}DefaultSchema(JavaBeanSchema.class)
+ * class MyClass {
+ *   public final String user;
+ *   public final int age;
+ *
+ *   private MyClass(String user, int age) { this.user = user; this.age = age; 
}
+ *
+ *   {@literal @}SchemaCreate
+ *   public MyClass create(String user, int age) {
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182072)
Time Spent: 4h 20m  (was: 4h 10m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182071
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:50
Start Date: 07/Jan/19 22:50
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245825617
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/FieldName.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.annotations;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import javax.annotation.CheckForNull;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+
+/**
+ * When used on a POJO field or a JavaBean getter, the specified name is used 
for the generated
+ * schema field.
+ *
+ * For example, a Java POJO with a field that we want in our schema but 
under a different name.
+ *
+ * 
+ *   {@literal @}DefaultSchema(JavaBeanSchema.class)
+ *   class MyClass {
+ * public String user;
+ * {@literal @}SchemaName("age")
+ * public int ageInYears;
+ *   }
+ * 
+ *
+ * The resulting schema will have fields named "user" and "age."
+ */
+@Documented
+@Retention(RetentionPolicy.RUNTIME)
+@Target({ElementType.FIELD, ElementType.METHOD})
+@SuppressWarnings("rawtypes")
+@Experimental(Kind.SCHEMAS)
+public @interface FieldName {
+  @CheckForNull
 
 Review comment:
   Good point. Our build assumes everything is @Nonnull unless explicitly 
marked @Nullable, so I'll just delete the annotation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182071)
Time Spent: 4h 10m  (was: 4h)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182070
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:49
Start Date: 07/Jan/19 22:49
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#discussion_r245825310
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/DefaultSchema.java
 ##
 @@ -97,7 +100,7 @@ private SchemaProvider getSchemaProvider(TypeDescriptor 
typeDescriptor) {
   + " specified as the default SchemaProvider for type "
   + type
 
 Review comment:
   This is a TypeDescriptor which doesn't have a getName. Are you suggesting 
type.getRawType().getName()?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182070)
Time Spent: 4h  (was: 3h 50m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support

2019-01-07 Thread Ahmed El.Hussaini (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736449#comment-16736449
 ] 

Ahmed El.Hussaini commented on BEAM-6241:
-

[~kenn] appreciate your immediate response and help. Will do what I can on my 
end in the meantime.

> MongoDbIO - Add Limit and Aggregates Support
> 
>
> Key: BEAM-6241
> URL: https://issues.apache.org/jira/browse/BEAM-6241
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mongodb
>Affects Versions: 2.9.0
>Reporter: Ahmed El.Hussaini
>Assignee: Ahmed El.Hussaini
>Priority: Major
>  Labels: easyfix
> Fix For: 2.10.0
>
>
> h2. Adds Support to Limit Results
>  
> {code:java}
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withFilter("{\"scientist\":\"Einstein\"}")
> .withLimit(5));{code}
> h2. Adds Support to Use Aggregates
>  
> {code:java}
> List aggregates = new ArrayList();
>   aggregates.add(
> new BsonDocument(
>   "$match",
>   new BsonDocument("country", new BsonDocument("$eq", new 
> BsonString("England");
> PCollection output =
>   pipeline.apply(
> MongoDbIO.read()
>   .withUri("mongodb://localhost:" + port)
>   .withDatabase(DATABASE)
>   .withCollection(COLLECTION)
>   .withAggregate(aggregates));
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182066
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:41
Start Date: 07/Jan/19 22:41
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #7397: [BEAM-6347] Add 
website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245524376
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,419 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case.  See the [new
+I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for a general overview of developing a new I/O connector.
+
+This page describes implementation details for developing sources and sinks
+using Java. The Python SDK offers the same functionality, but uses a slightly
+different API. See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Implementation options
+
+**Sources**
+
+For bounded (batch) sources, there are currently two options for creating a 
Beam
+source:
+
+1. Use `ParDo` and `GroupByKey`.
+2. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
+
+`ParDo` is the recommended option, as implementing a `Source` can be tricky.
+The [developing I/O connectors overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+covers using `ParDo`, and lists some use cases where you might want to use a
+Source (such as [dynamic work rebalancing]({{ site.baseurl 
}}/blog/2016/05/18/splitAtFraction-method.html)).
+
+For unbounded (streaming) sources, you must use the `Source` interface and 
extend
+the `UnboundedSource` abstract subclass. `UnboundedSource` supports features 
that
+are useful for streaming pipelines such as checkpointing.
+
+Splittable DoFn is a new sources framework that is under development and will
+replace the other options for developing bounded and unbounded sources. For 
more
+information, see the
+[roadmap for multi-SDK connector efforts]({{ site.baseurl 
}}/roadmap/connectors-multi-sdk/).
+
+**Sinks**
+
+To create a Beam sink, we recommend that you use a single `ParDo` that writes 
the
 
 Review comment:
   Worked this into start of FilebasedSink section
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182066)

> Add page for developing I/O connectors for Java
> ---
>
> Key: BEAM-6347
> URL: https://issues.apache.org/jira/browse/BEAM-6347
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182065
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:41
Start Date: 07/Jan/19 22:41
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #7397: [BEAM-6347] Add 
website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245524353
 
 

 ##
 File path: website/src/documentation/io/developing-io-java.md
 ##
 @@ -0,0 +1,419 @@
+---
+layout: section
+title: "Apache Beam: Developing I/O connectors for Java"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-java/
+redirect_from: /documentation/io/authoring-java/
+---
+
+# Developing I/O connectors for Java
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector that usually consist of a
+source and a sink. All Beam sources and sinks are composite transforms; 
however,
+the implementation of your custom I/O depends on your use case.  See the [new
+I/O connector overview]({{ site.baseurl 
}}/documentation/io/developing-io-overview/)
+for a general overview of developing a new I/O connector.
+
+This page describes implementation details for developing sources and sinks
+using Java. The Python SDK offers the same functionality, but uses a slightly
+different API. See [Developing I/O connectors for Python]({{ site.baseurl 
}}/documentation/io/developing-io-python/)
+for information specific to the Python SDK.
+
+## Implementation options
+
+**Sources**
+
+For bounded (batch) sources, there are currently two options for creating a 
Beam
+source:
+
+1. Use `ParDo` and `GroupByKey`.
 
 Review comment:
   Done - I worked in a redirect sentence into the intro and removed this whole 
Implementation options section
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182065)
Time Spent: 2h 40m  (was: 2.5h)

> Add page for developing I/O connectors for Java
> ---
>
> Key: BEAM-6347
> URL: https://issues.apache.org/jira/browse/BEAM-6347
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182067
 ]

ASF GitHub Bot logged work on BEAM-6347:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:41
Start Date: 07/Jan/19 22:41
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #7397: [BEAM-6347] Add 
website page for developing I/O connectors for Java
URL: https://github.com/apache/beam/pull/7397#discussion_r245788401
 
 

 ##
 File path: website/src/documentation/io/developing-io-overview.md
 ##
 @@ -0,0 +1,171 @@
+---
+layout: section
+title: "Overview: Developing a new I/O connector"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/developing-io-overview/
+redirect_from: /documentation/io/authoring-overview/
+---
+
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Overview: Developing a new I/O connector
+
+_A guide for users who need to connect to a data store that isn't supported by
+the [Built-in I/O connectors]({{site.baseurl }}/documentation/io/built-in/)_
+
+To connect to a data store that isn’t supported by Beam’s existing I/O
+connectors, you must create a custom I/O connector. A connector usually 
consists
+of a source and a sink. All Beam sources and sinks are composite transforms;
+however, the implementation of your custom I/O depends on your use case. Here
+are the recommended steps to get started:
+
+1. Read this overview and choose your implementation. You can email the
+   [Beam dev mailing list]({{ site.baseurl }}/get-started/support) with any
+   questions you might have. In addition, you can check if anyone else is
+   working on the same I/O connector.  
+
+1. If you plan to contribute your I/O connector to the Beam community, see the
+   [Apache Beam contribution guide]({{ site.baseurl 
}}/contribute/contribution-guide/).  
+
+1. Read the [PTransform style guide]({{ site.baseurl 
}}/contribute/ptransform-style-guide/)
+   for additional style guide recommendations.
+
+
+## Implementation options
+
+### Sources
+
+For bounded (batch) sources, there are currently two options for creating a 
Beam
+source:
+
+1. Use `ParDo` and `GroupByKey`.  
+
+1. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
+
+`ParDo` is the recommended option, as implementing a `Source` can be tricky. 
See
+[When to use the Source interface](#when-to-use-source) for a list of some use
+cases where you might want to use a `Source` (such as
+[dynamic work rebalancing]({{ site.baseurl 
}}/blog/2016/05/18/splitAtFraction-method.html)).
+
+(Java only) For unbounded (streaming) sources, you must use the `Source`
+interface and extend the `UnboundedSource` abstract subclass. `UnboundedSource`
+supports features that are useful for streaming pipelines, such as
+checkpointing.
+
+Splittable DoFn is a new sources framework that is under development and will
+replace the other options for developing bounded and unbounded sources. For 
more
+information, see the
+[roadmap for multi-SDK connector efforts]({{ site.baseurl 
}}/roadmap/connectors-multi-sdk/).
+
+### When to use the Source interface {#when-to-use-source}
 
 Review comment:
   Rearranged, the Implementation options section seems unneeded now so I 
removed it. the sections were so small under Sink that I rewrote it to match 
the style of the Source section  (For X, use Y) and incorporated the text from 
your next comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182067)
Time Spent: 2h 50m  (was: 2h 40m)

> Add page for developing I/O connectors for Java
> ---
>
> Key: BEAM-6347
> URL: https://issues.apache.org/jira/browse/BEAM-6347
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6254) Add a "Learning Resources" page

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6254?focusedWorklogId=182064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182064
 ]

ASF GitHub Bot logged work on BEAM-6254:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:39
Start Date: 07/Jan/19 22:39
Worklog Time Spent: 10m 
  Work Description: melap commented on issue #7303: [BEAM-6254] Add a 
Learning Resources page to the website
URL: https://github.com/apache/beam/pull/7303#issuecomment-452107648
 
 
   Run Website_Stage_GCS PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182064)
Time Spent: 1.5h  (was: 1h 20m)

> Add a "Learning Resources" page
> ---
>
> Key: BEAM-6254
> URL: https://issues.apache.org/jira/browse/BEAM-6254
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6231) Triage test failures introduced by use_executable_stage_bundle_execution

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6231?focusedWorklogId=182059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182059
 ]

ASF GitHub Bot logged work on BEAM-6231:


Author: ASF GitHub Bot
Created on: 07/Jan/19 22:09
Start Date: 07/Jan/19 22:09
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7356: [BEAM-6231] Make 
Dataflow runner harness work with FixedWindow
URL: https://github.com/apache/beam/pull/7356#issuecomment-452099745
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182059)
Time Spent: 4h  (was: 3h 50m)

> Triage test failures introduced by use_executable_stage_bundle_execution
> 
>
> Key: BEAM-6231
> URL: https://issues.apache.org/jira/browse/BEAM-6231
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support

2019-01-07 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736422#comment-16736422
 ] 

Kenneth Knowles commented on BEAM-6241:
---

Sorry for the delay. Mentioning someone on the PR is good, if you can see who 
might be interested (like through {{git blame}}) and another way is to ask on 
d...@beam.apache.org. But generally we hope to get to PRs so they are not left 
sitting without review.

In this case, I am going to send mail to the list about all issues targeting 
2.10.0 where there is a PR that is sitting for review.

> MongoDbIO - Add Limit and Aggregates Support
> 
>
> Key: BEAM-6241
> URL: https://issues.apache.org/jira/browse/BEAM-6241
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mongodb
>Affects Versions: 2.9.0
>Reporter: Ahmed El.Hussaini
>Assignee: Ahmed El.Hussaini
>Priority: Major
>  Labels: easyfix
> Fix For: 2.10.0
>
>
> h2. Adds Support to Limit Results
>  
> {code:java}
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withFilter("{\"scientist\":\"Einstein\"}")
> .withLimit(5));{code}
> h2. Adds Support to Use Aggregates
>  
> {code:java}
> List aggregates = new ArrayList();
>   aggregates.add(
> new BsonDocument(
>   "$match",
>   new BsonDocument("country", new BsonDocument("$eq", new 
> BsonString("England");
> PCollection output =
>   pipeline.apply(
> MongoDbIO.read()
>   .withUri("mongodb://localhost:" + port)
>   .withDatabase(DATABASE)
>   .withCollection(COLLECTION)
>   .withAggregate(aggregates));
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6363) Upgrade Gearpump to 0.9.0

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6363?focusedWorklogId=182049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182049
 ]

ASF GitHub Bot logged work on BEAM-6363:


Author: ASF GitHub Bot
Created on: 07/Jan/19 21:47
Start Date: 07/Jan/19 21:47
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7425: 
[BEAM-6363] Upgrade Gearpump to 0.9.0
URL: https://github.com/apache/beam/pull/7425
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182049)
Time Spent: 0.5h  (was: 20m)

> Upgrade Gearpump to 0.9.0
> -
>
> Key: BEAM-6363
> URL: https://issues.apache.org/jira/browse/BEAM-6363
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-gearpump
>Affects Versions: 2.9.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is the first release after Gearpump retired from Apache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support

2019-01-07 Thread Ahmed El.Hussaini (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736411#comment-16736411
 ] 

Ahmed El.Hussaini commented on BEAM-6241:
-

[~kenn] what can I do here or who should I mention to get this issue, and the 
associated PR reviewed?

> MongoDbIO - Add Limit and Aggregates Support
> 
>
> Key: BEAM-6241
> URL: https://issues.apache.org/jira/browse/BEAM-6241
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mongodb
>Affects Versions: 2.9.0
>Reporter: Ahmed El.Hussaini
>Assignee: Ahmed El.Hussaini
>Priority: Major
>  Labels: easyfix
> Fix For: 2.10.0
>
>
> h2. Adds Support to Limit Results
>  
> {code:java}
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withFilter("{\"scientist\":\"Einstein\"}")
> .withLimit(5));{code}
> h2. Adds Support to Use Aggregates
>  
> {code:java}
> List aggregates = new ArrayList();
>   aggregates.add(
> new BsonDocument(
>   "$match",
>   new BsonDocument("country", new BsonDocument("$eq", new 
> BsonString("England");
> PCollection output =
>   pipeline.apply(
> MongoDbIO.read()
>   .withUri("mongodb://localhost:" + port)
>   .withDatabase(DATABASE)
>   .withCollection(COLLECTION)
>   .withAggregate(aggregates));
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5964) Add ClickHouseIO.Write

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5964?focusedWorklogId=182048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182048
 ]

ASF GitHub Bot logged work on BEAM-5964:


Author: ASF GitHub Bot
Created on: 07/Jan/19 21:44
Start Date: 07/Jan/19 21:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7006: [BEAM-5964] Add 
ClickHouseIO.Write
URL: https://github.com/apache/beam/pull/7006#issuecomment-452092826
 
 
   @kanterov Yes, feel free to propose a cherrypick PR. This was clearly pretty 
much done when the release proposal thread started.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182048)
Time Spent: 8h 40m  (was: 8.5h)

> Add ClickHouseIO.Write
> --
>
> Key: BEAM-5964
> URL: https://issues.apache.org/jira/browse/BEAM-5964
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data 
> that is updated in real time. The project was released as open-source 
> software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for 
> OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts 
> supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and 
> don't put it in ClickHouseIO.Write
> h3. References
> [1] 
> http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] 
> https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] 
> https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6381) :beam-test-infra-metrics:test org.apache.beam.testinfra.metrics.ProberTests > CheckGrafanaStalenessAlerts failed

2019-01-07 Thread Boyuan Zhang (JIRA)
Boyuan Zhang created BEAM-6381:
--

 Summary: :beam-test-infra-metrics:test 
org.apache.beam.testinfra.metrics.ProberTests > CheckGrafanaStalenessAlerts 
failed
 Key: BEAM-6381
 URL: https://issues.apache.org/jira/browse/BEAM-6381
 Project: Beam
  Issue Type: Test
  Components: test-failures
Reporter: Boyuan Zhang
Assignee: Scott Wegner


https://builds.apache.org/job/beam_Release_NightlySnapshot/295/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736373#comment-16736373
 ] 

Boyuan Zhang commented on BEAM-6355:


Feel like it's related to [https://github.com/apache/beam/pull/7351]

> [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] 
> ExceptionInInitializerError
> --
>
> Key: BEAM-6355
> URL: https://issues.apache.org/jira/browse/BEAM-6355
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, test-failures
>Reporter: Scott Wegner
>Assignee: Boyuan Zhang
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/]
>  * [Dataflow 
> Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589]
>  * [Test source 
> code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy]
> Initial investigation:
> It appears the Dataflow job failed during worker initialization. From the 
> [stackdriver 
> logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z],
>  I see:
> {code}
> 019-01-03 03:08:27.770 PST
> Uncaught exception occurred during work unit execution. This will be retried.
> Expand all | Collapse all {
>  insertId:  "3832125194122580497:879173:0:62501"  
>  jsonPayload: {
>   exception:  "java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60)
>   at 
> org.

[jira] [Commented] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed

2019-01-07 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736360#comment-16736360
 ] 

Ahmet Altay commented on BEAM-6380:
---

Error is:
*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/gcsio.py",
 line 581, in _start_upload*22:01:22* 
self._client.objects.Insert(self._insert_request, 
upload=self._upload)*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py",
 line 1154, in Insert*22:01:22* upload=upload, 
upload_config=upload_config)*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/base_api.py",
 line 715, in _RunMethod*22:01:22* http_request, 
client=self.client)*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py",
 line 876, in InitializeUpload*22:01:22* return 
self.StreamInChunks()*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py",
 line 988, in StreamInChunks*22:01:22* 
additional_headers=additional_headers)*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py",
 line 939, in __StreamMedia*22:01:22* 
self.RefreshResumableUploadState()*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py",
 line 841, in RefreshResumableUploadState*22:01:22* 
self.stream.seek(self.progress)*22:01:22*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/filesystemio.py",
 line 264, in seek*22:01:22* raise NotImplementedError*22:01:22* 
NotImplementedError
 

It seems like this is a missing feature in Beam.

When an upload needs to resume and this is fails, apitools tries to seek to an 
earlier point in the uploadable file 
(https://github.com/google/apitools/blob/effc2e576427d8c876b1d64c79edcd98ab433074/apitools/base/py/transfer.py#L850),
 however Beam filesystem does not support this 
(https://github.com/apache/beam/blob/d02b859cb37e3c9f565785e93384905f1078b409/sdks/python/apache_beam/io/filesystemio.py#L264).

We can either update filesystemio to support this case, or add enable bundle 
retrying capability for the direct runner.

> apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
> ---
>
> Key: BEAM-6380
> URL: https://issues.apache.org/jira/browse/BEAM-6380
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Ahmet Altay
>Priority: Major
>
> wordcount test in :pythonPostCommit failed owing to RuntimeError: 
> NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles']
>  
>  https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182036
 ]

ASF GitHub Bot logged work on BEAM-6326:


Author: ASF GitHub Bot
Created on: 07/Jan/19 20:52
Start Date: 07/Jan/19 20:52
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7432: [BEAM-6326] Relax test 
assumption for PAssertTest#testWindowedIsEqualTo
URL: https://github.com/apache/beam/pull/7432#issuecomment-452077732
 
 
   This doesn't seem to be the right way, at least the DirectRunner behaves 
differently. Will have to materialize this correctly for the side input then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182036)
Time Spent: 0.5h  (was: 20m)

> Fix test failures in streaming mode of PortableValidatesRunner
> --
>
> Key: BEAM-6326
> URL: https://issues.apache.org/jira/browse/BEAM-6326
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of BEAM-6009, the tests are run separately for batch and streaming. This 
> has revealed issues with a couple of tests which need to be addressed.
> The Gradle task is: {{validatesPortableRunnerStreaming}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed

2019-01-07 Thread Boyuan Zhang (JIRA)
Boyuan Zhang created BEAM-6380:
--

 Summary: apache_beam.examples.wordcount_it_test.WordCountIT with 
DirectRunner failed
 Key: BEAM-6380
 URL: https://issues.apache.org/jira/browse/BEAM-6380
 Project: Beam
  Issue Type: Test
  Components: test-failures
Reporter: Boyuan Zhang
Assignee: Ahmet Altay


wordcount test in :pythonPostCommit failed owing to RuntimeError: 
NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles']

 
 https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5958) [Flake][beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle] VR tests flaking with stuck tests.

2019-01-07 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736326#comment-16736326
 ] 

Daniel Oliveira commented on BEAM-5958:
---

I disagree on this issue being a duplicate. That said it should still be 
closed, since the original issue doesn't seem to appear in the updated test 
target. None of the recent failures I checked exhibit this error. 
[https://builds.apache.org/view/All/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/]

> [Flake][beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle] 
> VR tests flaking with stuck tests.
> ---
>
> Key: BEAM-5958
> URL: https://issues.apache.org/jira/browse/BEAM-5958
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Priority: Minor
> Fix For: Not applicable
>
>
> Some of the VR tests in this target occasionally get stuck and run for over 
> an hour, and end up killed due to timing out.
> Recent run: 
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle/20/]
> The line indicating the failure looks like this:
> {noformat}
> java.lang.RuntimeException: 
> Workflow failed. Causes: The Dataflow job appears to be stuck because no 
> worker activity has been seen in the last 1h. You can get help with Cloud 
> Dataflow at https://cloud.google.com/dataflow/support.{noformat}
> [https://scans.gradle.com/s/jtbvh7kerp3eq/tests/wc3tcgdeth5bc-i6d7ckdlc55ws]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6341) Python VR failure: test_flatten_multiple_pcollections_having_multiple_consumers

2019-01-07 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736318#comment-16736318
 ] 

Daniel Oliveira commented on BEAM-6341:
---

The out of memory error doesn't seem to have popped up again recently. I think 
that one's unrelated.

The timeout flakes are common though, and happen in multiple different tests 
not just test_flatten_multiple_pcollections_having_multiple_consumers. I'll 
keep looking into it.

> Python VR failure: 
> test_flatten_multiple_pcollections_having_multiple_consumers
> ---
>
> Key: BEAM-6341
> URL: https://issues.apache.org/jira/browse/BEAM-6341
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Daniel Oliveira
>Priority: Major
>
> [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/2154/]
> Workflow failed. Causes: The Dataflow job appears to be stuck because no 
> worker activity has been seen in the last 1h.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6345) BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736302#comment-16736302
 ] 

Boyuan Zhang commented on BEAM-6345:


https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/605/

> BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker
> -
>
> Key: BEAM-6345
> URL: https://issues.apache.org/jira/browse/BEAM-6345
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/576/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError

2019-01-07 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736301#comment-16736301
 ] 

Daniel Oliveira commented on BEAM-6355:
---

 I once saw a similar issue that was caused by a shading issue, where a class 
was being vendored but the code calling it wasn't calling the vendored version. 
That time the bug was caused because some of the artifacts built weren't 
rebuilt after the shading change was pulled in, so the fix was as simple as 
cleaning the entire build environment and rebuilding from scratch.

I don't know how this PostRelease test is built, but if it tries to reuse 
artifacts instead of doing a clean build then that might be related.

> [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] 
> ExceptionInInitializerError
> --
>
> Key: BEAM-6355
> URL: https://issues.apache.org/jira/browse/BEAM-6355
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, test-failures
>Reporter: Scott Wegner
>Assignee: Boyuan Zhang
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/]
>  * [Dataflow 
> Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589]
>  * [Test source 
> code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy]
> Initial investigation:
> It appears the Dataflow job failed during worker initialization. From the 
> [stackdriver 
> logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z],
>  I see:
> {code}
> 019-01-03 03:08:27.770 PST
> Uncaught exception occurred during work unit execution. This will be retried.
> Expand all | Collapse all {
>  insertId:  "3832125194122580497:879173:0:62501"  
>  jsonPayload: {
>   exception:  "java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   

[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182018
 ]

ASF GitHub Bot logged work on BEAM-6326:


Author: ASF GitHub Bot
Created on: 07/Jan/19 20:02
Start Date: 07/Jan/19 20:02
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7432: [BEAM-6326] Relax test 
assumption for PAssertTest#testWindowedIsEqualTo
URL: https://github.com/apache/beam/pull/7432#issuecomment-452063257
 
 
   Run Java Flink PortableValidatesRunner Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182018)
Time Spent: 20m  (was: 10m)

> Fix test failures in streaming mode of PortableValidatesRunner
> --
>
> Key: BEAM-6326
> URL: https://issues.apache.org/jira/browse/BEAM-6326
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As of BEAM-6009, the tests are run separately for batch and streaming. This 
> has revealed issues with a couple of tests which need to be addressed.
> The Gradle task is: {{validatesPortableRunnerStreaming}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5975) [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors relating to runMobileGamingJavaDirect

2019-01-07 Thread Daniel Oliveira (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira resolved BEAM-5975.
---
   Resolution: Cannot Reproduce
Fix Version/s: Not applicable

> [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors 
> relating to runMobileGamingJavaDirect
> ---
>
> Key: BEAM-5975
> URL: https://issues.apache.org/jira/browse/BEAM-5975
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: Not applicable
>
>
> beam_PostRelease_NightlySnapshot has failed two nights in a row due to issues 
> relating to runMobileGamingJavaDirect. It's possible that these are two 
> separate bugs since the failures are different, but I am grouping them 
> together because they happened on consecutive nights and very closely to each 
> other in the test logs, so it seems likely they're related.
>  
> [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/421/]
> In the first failure the test times out while executing 
> runMobileGamingJavaDirect:
> {noformat}
> 04:40:03 Waiting on bqjob_r2bb92817cbd78f51_0166debcb7da_1 ... (0s) 
> Current status: DONE   
> 04:40:03 +---+
> 04:40:03 | table_id  |
> 04:40:03 +---+
> 04:40:03 | hourly_team_score_python_dataflow |
> 04:40:03 | hourly_team_score_python_direct   |
> 04:40:03 | leaderboard_DataflowRunner_team   |
> 04:40:03 | leaderboard_DataflowRunner_user   |
> 04:40:03 | leaderboard_DirectRunner_team |
> 04:40:03 | leaderboard_DirectRunner_user |
> 04:40:03 +---+
> 04:40:06 bq query SELECT table_id FROM 
> beam_postrelease_mobile_gaming.__TABLES_SUMMARY__
> 04:40:06 Build timed out (after 100 minutes). Marking the build as aborted.
> 04:40:06 Build was aborted
> 04:40:06 
> 04:40:08 Finished: ABORTED{noformat}
>  
> [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/422/]
> In the next day's run the test fails due to an issue building a class while 
> building  runMobileGamingJavaDirect:
> {noformat}
> [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ 
> word-count-beam ---
> [INFO] Changes detected - recompiling the module!
> [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. 
> build is platform dependent!
> [INFO] Compiling 31 source files to 
> /tmp/groovy-generated-7468741840914398994-tmpdir/word-count-beam/target/classes
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java 
> (default-cli) on project word-count-beam: An exception occured while 
> executing the Java class. org.apache.beam.examples.complete.game.LeaderBoard 
> -> [Help 1]
> [ERROR]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5975) [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors relating to runMobileGamingJavaDirect

2019-01-07 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736279#comment-16736279
 ] 

Daniel Oliveira commented on BEAM-5975:
---

I looked through the available build history for 
beam_PostRelease_NightlySnapshot, and it doesn't seem to be having this error 
anywhere anymore. There is an error in runMobileGamingJavaDataflow, but it 
seems completely unrelated, so I'll create a new bug for that if one doesn't 
exist.

> [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors 
> relating to runMobileGamingJavaDirect
> ---
>
> Key: BEAM-5975
> URL: https://issues.apache.org/jira/browse/BEAM-5975
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> beam_PostRelease_NightlySnapshot has failed two nights in a row due to issues 
> relating to runMobileGamingJavaDirect. It's possible that these are two 
> separate bugs since the failures are different, but I am grouping them 
> together because they happened on consecutive nights and very closely to each 
> other in the test logs, so it seems likely they're related.
>  
> [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/421/]
> In the first failure the test times out while executing 
> runMobileGamingJavaDirect:
> {noformat}
> 04:40:03 Waiting on bqjob_r2bb92817cbd78f51_0166debcb7da_1 ... (0s) 
> Current status: DONE   
> 04:40:03 +---+
> 04:40:03 | table_id  |
> 04:40:03 +---+
> 04:40:03 | hourly_team_score_python_dataflow |
> 04:40:03 | hourly_team_score_python_direct   |
> 04:40:03 | leaderboard_DataflowRunner_team   |
> 04:40:03 | leaderboard_DataflowRunner_user   |
> 04:40:03 | leaderboard_DirectRunner_team |
> 04:40:03 | leaderboard_DirectRunner_user |
> 04:40:03 +---+
> 04:40:06 bq query SELECT table_id FROM 
> beam_postrelease_mobile_gaming.__TABLES_SUMMARY__
> 04:40:06 Build timed out (after 100 minutes). Marking the build as aborted.
> 04:40:06 Build was aborted
> 04:40:06 
> 04:40:08 Finished: ABORTED{noformat}
>  
> [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/422/]
> In the next day's run the test fails due to an issue building a class while 
> building  runMobileGamingJavaDirect:
> {noformat}
> [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ 
> word-count-beam ---
> [INFO] Changes detected - recompiling the module!
> [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. 
> build is platform dependent!
> [INFO] Compiling 31 source files to 
> /tmp/groovy-generated-7468741840914398994-tmpdir/word-count-beam/target/classes
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java 
> (default-cli) on project word-count-beam: An exception occured while 
> executing the Java class. org.apache.beam.examples.complete.game.LeaderBoard 
> -> [Help 1]
> [ERROR]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=182017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182017
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 07/Jan/19 20:02
Start Date: 07/Jan/19 20:02
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on pull request #7394: 
[BEAM-5315] Python 3 port io.textio module
URL: https://github.com/apache/beam/pull/7394#discussion_r245780402
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsource_test.py
 ##
 @@ -87,6 +87,22 @@ class EOL(object):
 def write_data(
 num_lines, no_data=False, directory=None, prefix=tempfile.template,
 eol=EOL.LF):
+  """Writes test data to a temporary file.
+
+  Args:
+num_lines (int): The number of lines to write.
+no_data (bool): If :data:`True`, empty lines will be written, otherwise
+  each line will contain a concatenation of b'line' and the line number.
+directory (str): The name of the directory to create the temporary file in.
+prefix (str): The prefix to use for the temporary file.
+eol (int): The line ending to use when writing.
+  :class:`~apache_beam.io.textio_test.EOL` exposes attributes that can be
+  used here to define the eol.
+
+  Returns:
+Tuple[str, List[str]): A tuple of the filename and a list of the written
 
 Review comment:
   You're right. Fixed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182017)
Time Spent: 12h 10m  (was: 12h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182016
 ]

ASF GitHub Bot logged work on BEAM-6326:


Author: ASF GitHub Bot
Created on: 07/Jan/19 20:02
Start Date: 07/Jan/19 20:02
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7432: [BEAM-6326] Relax 
test assumption for PAssertTest#testWindowedIsEqualTo
URL: https://github.com/apache/beam/pull/7432
 
 
   The Flink Runner returns a Pane which is early because it just assigns the
   Windows to the test values. At this point without further transformations,
   e.g. GroupByKey, it does not know that this is the only Pane.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 182016)
Time Spent: 10m
Remaining Estimate: 0h

> Fix test failures in streaming mode of PortableValidatesRunner
> --
>
> Key: BEAM-6326
> URL: https://issues.apache.org/jira/browse/BEAM-6326
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimat

[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181997
 ]

ASF GitHub Bot logged work on BEAM-6258:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:48
Start Date: 07/Jan/19 19:48
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7415: [BEAM-6258] Set 
grpc keep alive on server creation
URL: https://github.com/apache/beam/pull/7415#discussion_r245776335
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java
 ##
 @@ -144,7 +145,9 @@ private static Server createServer(List 
services, InetSocketAdd
   NettyServerBuilder.forPort(socket.getPort())
   // Set the message size to max value here. The actual size is 
governed by the
   // buffer size in the layers above.
-  .maxMessageSize(Integer.MAX_VALUE);
+  .maxMessageSize(Integer.MAX_VALUE)
+  .permitKeepAliveTime(1, TimeUnit.SECONDS)
+  .permitKeepAliveWithoutCalls(true);
 
 Review comment:
   What if we set the keep alive time explicitly on both ends? Just wondering 
whether it made sense to set the values instead of using this workaround.
   
   https://github.com/grpc/grpc/blob/master/doc/keepalive.md#defaults-values
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181997)
Time Spent: 1h  (was: 50m)

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (

[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181991
 ]

ASF GitHub Bot logged work on BEAM-6346:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:44
Start Date: 07/Jan/19 19:44
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7395: [BEAM-6346] 
Inspect the docker container state
URL: https://github.com/apache/beam/pull/7395#discussion_r245775105
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   LOG.debug("Created Docker Container with Container ID {}", containerId);
   // Wait on a client from the gRPC server.
   while (instructionHandler == null) {
+Preconditions.checkArgument(
+docker.isContainerRunning(containerId), "No container running for 
id " + containerId);
 
 Review comment:
   The check is too strict then. We won't get `RUNNING` directly after starting 
the container. We need something like a repeated check which times out after 
some max startup time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181991)
Time Spent: 1h  (was: 50m)

> Portable Flink Job hangs if the sdk harness fails to start
> --
>
> Key: BEAM-6346
> URL: https://issues.apache.org/jira/browse/BEAM-6346
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Portable beam job on Flink fails if the sdk harness container fails to start 
> as its started in detached mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6248) Add Flink 1.7.x build target to Flink Runner

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6248?focusedWorklogId=181983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181983
 ]

ASF GitHub Bot logged work on BEAM-6248:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:34
Start Date: 07/Jan/19 19:34
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #7300: [BEAM-6248] Add 
Flink v1.7 build target to Flink Runner 
URL: https://github.com/apache/beam/pull/7300#discussion_r245772111
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java
 ##
 @@ -86,6 +86,7 @@ public void testExecution() throws Exception {
 options.setRunner(CrashingRunner.class);
 options.as(FlinkPipelineOptions.class).setFlinkMaster("[local]");
 options.as(FlinkPipelineOptions.class).setStreaming(isStreaming);
+options.as(FlinkPipelineOptions.class).setParallelism(4);
 
 Review comment:
   To give a meaningful signal, the test should probably be 
different/specialized and assert something about the parallel execution? If 
not, then we can probably just set it to 1 ? As it stands we are operating on 
hope :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181983)
Time Spent: 4h  (was: 3h 50m)

> Add Flink 1.7.x build target to Flink Runner
> 
>
> Key: BEAM-6248
> URL: https://issues.apache.org/jira/browse/BEAM-6248
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> With BEAM-5419 we can add a Flink 1.7.x build target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181982
 ]

ASF GitHub Bot logged work on BEAM-3580:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:30
Start Date: 07/Jan/19 19:30
Worklog Time Spent: 10m 
  Work Description: htyleo commented on issue #7422: [BEAM-3580] Use a 
separate coder for string
URL: https://github.com/apache/beam/pull/7422#issuecomment-452053094
 
 
   > These coders represent "standard" beam coders as defined by the FnAPI, so 
we can't simply add to them. An alternative would be to have the String Coder 
be a custom coder instead, that we "install" by default, similar to how we're 
currently handling Protocol Buffers or JSON.
   > 
   > This will need to be tested against the Universal Local Runner, to verify 
the semantics.
   
   Thanks @lostluck for correcting me. I'm going to close this PR and file 
another one, which should implement a custom coder (instead of a built-in 
coder) for string. I will try testing it against the Universal Local Runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181982)
Time Spent: 1h 10m  (was: 1h)

> Do not use Go BytesCoder to encode string
> -
>
> Key: BEAM-3580
> URL: https://issues.apache.org/jira/browse/BEAM-3580
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should not use the same built-in coder for two different types. It creates 
> the need for conversions at inopportune times in the runtime.
>  
> One option would be to a custom coder that shares encoding with bytes, given 
> that bytes are length prefixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181981
 ]

ASF GitHub Bot logged work on BEAM-3580:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:30
Start Date: 07/Jan/19 19:30
Worklog Time Spent: 10m 
  Work Description: htyleo commented on pull request #7422: [BEAM-3580] Use 
a separate coder for string
URL: https://github.com/apache/beam/pull/7422
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181981)
Time Spent: 1h  (was: 50m)

> Do not use Go BytesCoder to encode string
> -
>
> Key: BEAM-3580
> URL: https://issues.apache.org/jira/browse/BEAM-3580
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should not use the same built-in coder for two different types. It creates 
> the need for conversions at inopportune times in the runtime.
>  
> One option would be to a custom coder that shares encoding with bytes, given 
> that bytes are length prefixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=181973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181973
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:18
Start Date: 07/Jan/19 19:18
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #7394: [BEAM-5315] 
Python 3 port io.textio module
URL: https://github.com/apache/beam/pull/7394#discussion_r245767467
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsource_test.py
 ##
 @@ -87,6 +87,22 @@ class EOL(object):
 def write_data(
 num_lines, no_data=False, directory=None, prefix=tempfile.template,
 eol=EOL.LF):
+  """Writes test data to a temporary file.
+
+  Args:
+num_lines (int): The number of lines to write.
+no_data (bool): If :data:`True`, empty lines will be written, otherwise
+  each line will contain a concatenation of b'line' and the line number.
+directory (str): The name of the directory to create the temporary file in.
+prefix (str): The prefix to use for the temporary file.
+eol (int): The line ending to use when writing.
+  :class:`~apache_beam.io.textio_test.EOL` exposes attributes that can be
+  used here to define the eol.
+
+  Returns:
+Tuple[str, List[str]): A tuple of the filename and a list of the written
 
 Review comment:
   Thanks. I wish we had more docstrings like this one throughout SDK internals.
   
   Tuple[str, List[bytes]] in Line 103?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181973)
Time Spent: 12h  (was: 11h 50m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181969
 ]

ASF GitHub Bot logged work on BEAM-6346:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:10
Start Date: 07/Jan/19 19:10
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #7395: [BEAM-6346] 
Inspect the docker container state
URL: https://github.com/apache/beam/pull/7395#discussion_r245765042
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   LOG.debug("Created Docker Container with Container ID {}", containerId);
   // Wait on a client from the gRPC server.
   while (instructionHandler == null) {
+Preconditions.checkArgument(
+docker.isContainerRunning(containerId), "No container running for 
id " + containerId);
 
 Review comment:
   The main issue that I observed here was when the container failed to start. 
As it is started in detached mode, we don't know the state of the container and 
simply wait on the control client to connect.
   
   Synchronous container start will not work and will be very difficult to 
orchestrate.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181969)
Time Spent: 50m  (was: 40m)

> Portable Flink Job hangs if the sdk harness fails to start
> --
>
> Key: BEAM-6346
> URL: https://issues.apache.org/jira/browse/BEAM-6346
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Portable beam job on Flink fails if the sdk harness container fails to start 
> as its started in detached mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181971
 ]

ASF GitHub Bot logged work on BEAM-6258:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:13
Start Date: 07/Jan/19 19:13
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #7415: [BEAM-6258] 
Set grpc keep alive on server creation
URL: https://github.com/apache/beam/pull/7415#discussion_r245763524
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java
 ##
 @@ -144,7 +145,9 @@ private static Server createServer(List 
services, InetSocketAdd
   NettyServerBuilder.forPort(socket.getPort())
   // Set the message size to max value here. The actual size is 
governed by the
   // buffer size in the layers above.
-  .maxMessageSize(Integer.MAX_VALUE);
+  .maxMessageSize(Integer.MAX_VALUE)
+  .permitKeepAliveTime(1, TimeUnit.SECONDS)
+  .permitKeepAliveWithoutCalls(true);
 
 Review comment:
   We are not setting the client configuration anywhere. 
   However, I also feel that this might be a server/client configuration issues.
   This is just a work around till we find the root cause.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181971)
Time Spent: 50m  (was: 40m)

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY

[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181966
 ]

ASF GitHub Bot logged work on BEAM-6258:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:05
Start Date: 07/Jan/19 19:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #7415: [BEAM-6258] 
Set grpc keep alive on server creation
URL: https://github.com/apache/beam/pull/7415#discussion_r245763524
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java
 ##
 @@ -144,7 +145,9 @@ private static Server createServer(List 
services, InetSocketAdd
   NettyServerBuilder.forPort(socket.getPort())
   // Set the message size to max value here. The actual size is 
governed by the
   // buffer size in the layers above.
-  .maxMessageSize(Integer.MAX_VALUE);
+  .maxMessageSize(Integer.MAX_VALUE)
+  .permitKeepAliveTime(1, TimeUnit.SECONDS)
+  .permitKeepAliveWithoutCalls(true);
 
 Review comment:
   We are not checking the client configuration anywhere. 
   However, I also feel that this might be a server/client configuration issues.
   This is just a work around till we find the root cause.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181966)
Time Spent: 40m  (was: 0.5h)

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAW

[jira] [Commented] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner

2019-01-07 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736201#comment-16736201
 ] 

Robert Burke commented on BEAM-6372:


Off the top of my head, I think I agree with the assessment, the error message 
would ideally be very clear about what's missing and possibly why. One reason 
it hasn't been done this way is that for testing  with ptest and `go test`, 
it's inconvenient to have the registration scaffolding everywhere. In 
particular we'd probably want to maintain that ease, but still have the 
semantics check everything correctly, likely with a --test flag that's used by 
default in the testing harness.

> Direct Runner should marshal data in a similar way to Dataflow runner
> -
>
> Key: BEAM-6372
> URL: https://issues.apache.org/jira/browse/BEAM-6372
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> I would test my pipeline using the direct runner, and it would happily run on 
> a sample. When I ran it on the Dataflow runner, it'll run for a hour, then 
> get to a stage that would crash like so:
>  
> {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction -224: execute failed: panic: 
> reflect: Call using main.HistogramResult as type struct \{ Key string 
> "json:\"key\""; Files []string "json:\"files\""; Histogram 
> palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R 
> uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 
> [running]:{quote}
> This was because I forgot to register my HistogramResult type.
> It would be useful if the direct runner tried to marshal and unmarshal all 
> types, to help expose issues like this earlier.
> Also, when running on Dataflow, the value of flags, and captured variables, 
> would be the empty/default value. It would be good if direct also caused this 
> behaviour. For example:
> {code}
> prefix := “X”
> s = s.Scope(“Prefix ” + prefix)
> c = beam.ParDo(s, func(value string) string {
>   return prefix + value
> }, c)
> {code}
> Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. 
> Subtle behaviour, but I suspect the direct runner could expose this if it 
> marshalled and unmarshalled the func like the dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181970
 ]

ASF GitHub Bot logged work on BEAM-3580:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:10
Start Date: 07/Jan/19 19:10
Worklog Time Spent: 10m 
  Work Description: htyleo commented on pull request #7422: [BEAM-3580] Use 
a separate coder for string
URL: https://github.com/apache/beam/pull/7422#discussion_r245765155
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/coder/coder.go
 ##
 @@ -122,6 +122,7 @@ type Kind string
 const (
CustomKind = "Custom" // Implicitly length-prefixed
Bytes Kind = "bytes"  // Implicitly length-prefixed as part of 
the encoding
+   StringKind = "string" // Implicitly length-prefixed as part of 
the encoding
 
 Review comment:
   Oh, thanks for pointing that out! I misunderstood the meaning of "custom 
coder" in the bug description.
   
   I'm going to close this PR and file another one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181970)
Time Spent: 50m  (was: 40m)

> Do not use Go BytesCoder to encode string
> -
>
> Key: BEAM-3580
> URL: https://issues.apache.org/jira/browse/BEAM-3580
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should not use the same built-in coder for two different types. It creates 
> the need for conversions at inopportune times in the runtime.
>  
> One option would be to a custom coder that shares encoding with bytes, given 
> that bytes are length prefixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181968
 ]

ASF GitHub Bot logged work on BEAM-6346:


Author: ASF GitHub Bot
Created on: 07/Jan/19 19:10
Start Date: 07/Jan/19 19:10
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #7395: [BEAM-6346] 
Inspect the docker container state
URL: https://github.com/apache/beam/pull/7395#discussion_r245765030
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   LOG.debug("Created Docker Container with Container ID {}", containerId);
   // Wait on a client from the gRPC server.
   while (instructionHandler == null) {
+Preconditions.checkArgument(
+docker.isContainerRunning(containerId), "No container running for 
id " + containerId);
 try {
-  instructionHandler = clientSource.take(workerId, 
Duration.ofMinutes(2));
+  instructionHandler = clientSource.take(workerId, 
Duration.ofSeconds(30));
 
 Review comment:
   1 Min sounds good to me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181968)
Time Spent: 40m  (was: 0.5h)

> Portable Flink Job hangs if the sdk harness fails to start
> --
>
> Key: BEAM-6346
> URL: https://issues.apache.org/jira/browse/BEAM-6346
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Portable beam job on Flink fails if the sdk harness container fails to start 
> as its started in detached mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3306) Consider: Go coder registry

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-3306:
--

Assignee: Robert Burke

> Consider: Go coder registry
> ---
>
> Key: BEAM-3306
> URL: https://issues.apache.org/jira/browse/BEAM-3306
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Minor
>
> Add coder registry to allow easier overwrite of default coders. We may also 
> allow otherwise un-encodable types, but that would require that function 
> analysis depends on it.
> If we're hardcoding support for proto/avro, then there may be little need for 
> such a feature. Conversely, this may be how we implement such support.
>  
> Proposal Doc: 
> [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736178#comment-16736178
 ] 

Boyuan Zhang commented on BEAM-6280:


Feel like it's related to python3 since all tests failed on python3 project. 
[~tvalentyn] could you please help take a look at this one?

> Failure in PortableRunnerTest.test_error_traceback_includes_user_code
> -
>
> Key: BEAM-6280
> URL: https://issues.apache.org/jira/browse/BEAM-6280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Assignee: Sam Rohde
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/]
> [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3]
> {code:java}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 466, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py",
>  line 425, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 314, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start 
>  coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6345) BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736172#comment-16736172
 ] 

Boyuan Zhang commented on BEAM-6345:


https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Cron/253/

> BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker
> -
>
> Key: BEAM-6345
> URL: https://issues.apache.org/jira/browse/BEAM-6345
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/576/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-01-07 Thread Sam Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736180#comment-16736180
 ] 

Sam Rohde commented on BEAM-6280:
-

[~boyuanz] I appreciate the help, but I've narrowed the bug down to a data race 
in the local_job_service. A fix is forthcoming

> Failure in PortableRunnerTest.test_error_traceback_includes_user_code
> -
>
> Key: BEAM-6280
> URL: https://issues.apache.org/jira/browse/BEAM-6280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Assignee: Sam Rohde
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/]
> [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3]
> {code:java}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 466, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py",
>  line 425, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 314, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start 
>  coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736176#comment-16736176
 ] 

Boyuan Zhang commented on BEAM-6280:


https://builds.apache.org/job/beam_PreCommit_Python_Cron/794/

> Failure in PortableRunnerTest.test_error_traceback_includes_user_code
> -
>
> Key: BEAM-6280
> URL: https://issues.apache.org/jira/browse/BEAM-6280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Assignee: Sam Rohde
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/]
> [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3]
> {code:java}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 466, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py",
>  line 425, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 314, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start 
>  coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181962
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 07/Jan/19 18:49
Start Date: 07/Jan/19 18:49
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce 
PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272#issuecomment-452039844
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181962)
Time Spent: 4h 50m  (was: 4h 40m)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181960
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 07/Jan/19 18:48
Start Date: 07/Jan/19 18:48
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce 
PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272#issuecomment-452039639
 
 
   Seems to be an issue starting up the VMs, due to not getting the built 
dataflow container.
   
   E  Handler for GET 
/v1.27/images/us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654/json
 returned error: No such image: 
us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654 
 undefined
   E  Error syncing pod ab3a90f13f7e8f68b0508d2fbee851d2 
("dataflow-testpipeline-jenkins-0105-01041719-00vb-harness-18hs_default(ab3a90f13f7e8f68b0508d2fbee851d2)"),
 skipping: failed to "StartContainer" for "sdk" with ImagePullBackOff: 
"Back-off pulling image 
\"us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654\"" 
 undefined
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181960)
Time Spent: 4.5h  (was: 4h 20m)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181961
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 07/Jan/19 18:48
Start Date: 07/Jan/19 18:48
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce 
PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272#issuecomment-452039766
 
 
   
   
   
https://pantheon.corp.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-04_17_19_42-763666818691945485&interval=NO_LIMIT&project=apache-beam-testing&e=13803970%2C13803205%2C13804191%2C13804201%2C13804391%2C13807341&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-07T18%3A46%3A56.72400Z&customFacets&limitCustomFacetWidth=true&scrollTimestamp=2019-01-05T02%3A31%3A53.793816000Z&advancedFilter=resource.type%3D%22dataflow_step%22%0Aresource.labels.region%3D%22us-central1%22%0Aresource.labels.job_name%3D%22testpipeline-jenkins-0105011930-95e3af97%22%0Aresource.labels.job_id%3D%222019-01-04_17_19_42-763666818691945485%22%0Aresource.labels.project_id%3D%22apache-beam-testing%22%0Atimestamp%3D%222019-01-05T02%3A31%3A53.792919067Z%22%0AinsertId%3D%22s%3D0c43aa38f1dc473b84d3c4a5aae74b06%3Bi%3Db7f%3Bb%3D21484f6efe6c4a7598aaf3520458d553%3Bm%3Dd4c0a671%3Bt%3D57eaccc7d520d%3Bx%3De456feb0b9171076%22
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181961)
Time Spent: 4h 40m  (was: 4.5h)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3306) Consider: Go coder registry

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-3306:
---
Description: 
Add coder registry to allow easier overwrite of default coders. We may also 
allow otherwise un-encodable types, but that would require that function 
analysis depends on it.

If we're hardcoding support for proto/avro, then there may be little need for 
such a feature. Conversely, this may be how we implement such support.

 

Proposal Doc: 
[https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
 

  was:
Add coder registry to allow easier overwrite of default coders. We may also 
allow otherwise un-encodable types, but that would require that function 
analysis depends on it.

If we're hardcoding support for proto/avro, then there may be little need for 
such a feature. Conversely, this may be how we implement such support.


> Consider: Go coder registry
> ---
>
> Key: BEAM-3306
> URL: https://issues.apache.org/jira/browse/BEAM-3306
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Minor
>
> Add coder registry to allow easier overwrite of default coders. We may also 
> allow otherwise un-encodable types, but that would require that function 
> analysis depends on it.
> If we're hardcoding support for proto/avro, then there may be little need for 
> such a feature. Conversely, this may be how we implement such support.
>  
> Proposal Doc: 
> [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3306) Consider: Go coder registry

2019-01-07 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736154#comment-16736154
 ] 

Robert Burke commented on BEAM-3306:


Added proposal doc link to description. 
https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#

> Consider: Go coder registry
> ---
>
> Key: BEAM-3306
> URL: https://issues.apache.org/jira/browse/BEAM-3306
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Minor
>
> Add coder registry to allow easier overwrite of default coders. We may also 
> allow otherwise un-encodable types, but that would require that function 
> analysis depends on it.
> If we're hardcoding support for proto/avro, then there may be little need for 
> such a feature. Conversely, this may be how we implement such support.
>  
> Proposal Doc: 
> [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6379) Research and introduce thread sanitizer code into the java SDK, and java runner harnesses

2019-01-07 Thread Alex Amato (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736143#comment-16736143
 ] 

Alex Amato commented on BEAM-6379:
--

One open source thread santizier, though there is little documentation and it 
was committed several years ago.

https://github.com/google/java-thread-sanitizer

> Research and introduce thread sanitizer code into the java SDK, and java 
> runner harnesses
> -
>
> Key: BEAM-6379
> URL: https://issues.apache.org/jira/browse/BEAM-6379
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6376) Java PreCommit failed continually owing to OOM on beam6

2019-01-07 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736122#comment-16736122
 ] 

Boyuan Zhang commented on BEAM-6376:


[https://builds.apache.org/job/beam_PreCommit_Java_Cron/794/]

https://builds.apache.org/job/beam_PreCommit_Java_Cron/795/

> Java PreCommit failed continually owing to OOM on beam6
> ---
>
> Key: BEAM-6376
> URL: https://issues.apache.org/jira/browse/BEAM-6376
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Alan Myrvold
>Priority: Major
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/788/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/789/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/790/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/791/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/792/]
> https://builds.apache.org/job/beam_PreCommit_Java_Cron/793/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6379) Research and introduce thread sanitizer code into the java SDK, and java runner harnesses

2019-01-07 Thread Alex Amato (JIRA)
Alex Amato created BEAM-6379:


 Summary: Research and introduce thread sanitizer code into the 
java SDK, and java runner harnesses
 Key: BEAM-6379
 URL: https://issues.apache.org/jira/browse/BEAM-6379
 Project: Beam
  Issue Type: New Feature
  Components: java-fn-execution
Reporter: Alex Amato






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow

2019-01-07 Thread Alex Amato (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736128#comment-16736128
 ] 

Alex Amato commented on BEAM-6375:
--

I will be visiting the metrics work in the SDKs and making sure the metrics are 
calculated properly.

Though, We will be focusing on Java and Python in the current quarter.

Some of them require some clever techniques to perform measurements.

> Walltime seems incorrect with the Go SDK and Dataflow
> -
>
> Key: BEAM-6375
> URL: https://issues.apache.org/jira/browse/BEAM-6375
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The pipeline takes hours to run, but the walltime for each step is in 
> seconds. For example, this pipeline took 22 hours, but no stage too more than 
> 3 seconds: https://pasteboard.co/HVf8PCp.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6374) "elements added" for input and output collections is always empty

2019-01-07 Thread Alex Amato (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736129#comment-16736129
 ] 

Alex Amato commented on BEAM-6374:
--

I will be visiting the metrics work in the SDKs and making sure the metrics are 
calculated properly.

Though, We will be focusing on Java and Python in the current quarter.

Some of them require some clever techniques to perform measurements.

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6372:
--

Assignee: Robert Burke

> Direct Runner should marshal data in a similar way to Dataflow runner
> -
>
> Key: BEAM-6372
> URL: https://issues.apache.org/jira/browse/BEAM-6372
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> I would test my pipeline using the direct runner, and it would happily run on 
> a sample. When I ran it on the Dataflow runner, it'll run for a hour, then 
> get to a stage that would crash like so:
>  
> {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction -224: execute failed: panic: 
> reflect: Call using main.HistogramResult as type struct \{ Key string 
> "json:\"key\""; Files []string "json:\"files\""; Histogram 
> palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R 
> uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 
> [running]:{quote}
> This was because I forgot to register my HistogramResult type.
> It would be useful if the direct runner tried to marshal and unmarshal all 
> types, to help expose issues like this earlier.
> Also, when running on Dataflow, the value of flags, and captured variables, 
> would be the empty/default value. It would be good if direct also caused this 
> behaviour. For example:
> {code}
> prefix := “X”
> s = s.Scope(“Prefix ” + prefix)
> c = beam.ParDo(s, func(value string) string {
>   return prefix + value
> }, c)
> {code}
> Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. 
> Subtle behaviour, but I suspect the direct runner could expose this if it 
> marshalled and unmarshalled the func like the dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=181928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181928
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 07/Jan/19 17:45
Start Date: 07/Jan/19 17:45
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7289: [BEAM-6240] Add a 
library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289#issuecomment-452019113
 
 
   There is similar code working with reflection in Jackson, didn't have time 
to look deep into it. 
https://github.com/FasterXML/jackson-databind/blob/master/src/main/java/com/fasterxml/jackson/databind/introspect/JacksonAnnotationIntrospector.java#L291
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181928)
Time Spent: 3h 50m  (was: 3h 40m)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6373) Make Go SDK on Dataflow multi-threaded

2019-01-07 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736082#comment-16736082
 ] 

Robert Burke commented on BEAM-6373:


Without SplitableDoFn (SDF) , Go SDK jobs are unable to scale and divide work, 
which includes any kind of Parallelism. This is being worked on this quarter. 
See [https://beam.apache.org/roadmap/go-sdk] for additional information.

> Make Go SDK on Dataflow multi-threaded
> --
>
> Key: BEAM-6373
> URL: https://issues.apache.org/jira/browse/BEAM-6373
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> When running on dataflow, with a machine with more than 1 V-CPU, the worker 
> will never use more than 1 core of CPU. I presume this is because the 
> Dataflow runner does not allow the worker to run multiple steps concurrently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6373) Make Go SDK on Dataflow multi-threaded

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6373:
--

Assignee: Tyler Akidau  (was: Robert Burke)

> Make Go SDK on Dataflow multi-threaded
> --
>
> Key: BEAM-6373
> URL: https://issues.apache.org/jira/browse/BEAM-6373
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Tyler Akidau
>Priority: Major
>
> When running on dataflow, with a machine with more than 1 V-CPU, the worker 
> will never use more than 1 core of CPU. I presume this is because the 
> Dataflow runner does not allow the worker to run multiple steps concurrently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6374) "elements added" for input and output collections is always empty

2019-01-07 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736080#comment-16736080
 ] 

Robert Burke commented on BEAM-6374:


As with https://issues.apache.org/jira/browse/BEAM-6375, this is because the 
SDK doesn't currently record and communicate these values to the runner. 
[~ajam...@google.com] will be working on this in the next few months.

Running the Go SDK on Dataflow is unofficial and unsupported at this time for 
this and splittable dofn support.

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6373) Make Go SDK on Dataflow multi-threaded

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6373:
--

Assignee: Robert Burke  (was: Tyler Akidau)

> Make Go SDK on Dataflow multi-threaded
> --
>
> Key: BEAM-6373
> URL: https://issues.apache.org/jira/browse/BEAM-6373
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> When running on dataflow, with a machine with more than 1 V-CPU, the worker 
> will never use more than 1 core of CPU. I presume this is because the 
> Dataflow runner does not allow the worker to run multiple steps concurrently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6372:
--

Assignee: (was: Robert Burke)

> Direct Runner should marshal data in a similar way to Dataflow runner
> -
>
> Key: BEAM-6372
> URL: https://issues.apache.org/jira/browse/BEAM-6372
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct, sdk-go
>Reporter: Andrew Brampton
>Priority: Major
>
> I would test my pipeline using the direct runner, and it would happily run on 
> a sample. When I ran it on the Dataflow runner, it'll run for a hour, then 
> get to a stage that would crash like so:
>  
> {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction -224: execute failed: panic: 
> reflect: Call using main.HistogramResult as type struct \{ Key string 
> "json:\"key\""; Files []string "json:\"files\""; Histogram 
> palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R 
> uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 
> [running]:{quote}
> This was because I forgot to register my HistogramResult type.
> It would be useful if the direct runner tried to marshal and unmarshal all 
> types, to help expose issues like this earlier.
> Also, when running on Dataflow, the value of flags, and captured variables, 
> would be the empty/default value. It would be good if direct also caused this 
> behaviour. For example:
> {code}
> prefix := “X”
> s = s.Scope(“Prefix ” + prefix)
> c = beam.ParDo(s, func(value string) string {
>   return prefix + value
> }, c)
> {code}
> Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. 
> Subtle behaviour, but I suspect the direct runner could expose this if it 
> marshalled and unmarshalled the func like the dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow

2019-01-07 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736071#comment-16736071
 ] 

Robert Burke commented on BEAM-6375:


I suspect this has to do with that ParDos don't currently communicate work 
estimates and similar to the runner. IIRC [~ajam...@google.com] is intending to 
work on that in the next few months as part of updating Portable Metrics 
handling.

> Walltime seems incorrect with the Go SDK and Dataflow
> -
>
> Key: BEAM-6375
> URL: https://issues.apache.org/jira/browse/BEAM-6375
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The pipeline takes hours to run, but the walltime for each step is in 
> seconds. For example, this pipeline took 22 hours, but no stage too more than 
> 3 seconds: https://pasteboard.co/HVf8PCp.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6374) "elements added" for input and output collections is always empty

2019-01-07 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6374:
--

Assignee: Robert Burke  (was: Tyler Akidau)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow

2019-01-07 Thread Tyler Akidau (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Akidau reassigned BEAM-6375:
--

Assignee: Robert Burke  (was: Tyler Akidau)

> Walltime seems incorrect with the Go SDK and Dataflow
> -
>
> Key: BEAM-6375
> URL: https://issues.apache.org/jira/browse/BEAM-6375
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The pipeline takes hours to run, but the walltime for each step is in 
> seconds. For example, this pipeline took 22 hours, but no stage too more than 
> 3 seconds: https://pasteboard.co/HVf8PCp.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3612) Make it easy to generate type-specialized Go SDK reflectx.Funcs

2019-01-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3612?focusedWorklogId=181922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181922
 ]

ASF GitHub Bot logged work on BEAM-3612:


Author: ASF GitHub Bot
Created on: 07/Jan/19 17:15
Start Date: 07/Jan/19 17:15
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #7361: [BEAM-3612] Generate 
type assertion shims for beam.
URL: https://github.com/apache/beam/pull/7361#issuecomment-452009410
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 181922)
Time Spent: 8h 50m  (was: 8h 40m)

> Make it easy to generate type-specialized Go SDK reflectx.Funcs
> ---
>
> Key: BEAM-3612
> URL: https://issues.apache.org/jira/browse/BEAM-3612
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >