[jira] [Work logged] (BEAM-6097) Add NemoRunner
[ https://issues.apache.org/jira/browse/BEAM-6097?focusedWorklogId=182208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182208 ] ASF GitHub Bot logged work on BEAM-6097: Author: ASF GitHub Bot Created on: 08/Jan/19 07:29 Start Date: 08/Jan/19 07:29 Worklog Time Spent: 10m Work Description: wonook commented on issue #7236: [BEAM-6097] NemoRunner URL: https://github.com/apache/beam/pull/7236#issuecomment-452200892 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182208) Time Spent: 6h 40m (was: 6.5h) > Add NemoRunner > -- > > Key: BEAM-6097 > URL: https://issues.apache.org/jira/browse/BEAM-6097 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Won Wook SONG >Assignee: Won Wook SONG >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Add NemoRunner (http://nemo.apache.org) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6097) Add NemoRunner
[ https://issues.apache.org/jira/browse/BEAM-6097?focusedWorklogId=182207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182207 ] ASF GitHub Bot logged work on BEAM-6097: Author: ASF GitHub Bot Created on: 08/Jan/19 07:29 Start Date: 08/Jan/19 07:29 Worklog Time Spent: 10m Work Description: wonook commented on issue #7236: [BEAM-6097] NemoRunner URL: https://github.com/apache/beam/pull/7236#issuecomment-452200875 @mxm Thanks! 😄 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182207) Time Spent: 6.5h (was: 6h 20m) > Add NemoRunner > -- > > Key: BEAM-6097 > URL: https://issues.apache.org/jira/browse/BEAM-6097 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Won Wook SONG >Assignee: Won Wook SONG >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Add NemoRunner (http://nemo.apache.org) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6383) Your project apache/beam is using buggy third-party libraries [WARNING]
[ https://issues.apache.org/jira/browse/BEAM-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaifeng Huang closed BEAM-6383. --- Resolution: Incomplete Fix Version/s: Not applicable > Your project apache/beam is using buggy third-party libraries [WARNING] > --- > > Key: BEAM-6383 > URL: https://issues.apache.org/jira/browse/BEAM-6383 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kaifeng Huang >Assignee: Luke Cwik >Priority: Minor > Fix For: Not applicable > > > Hi, there! > We are a research team working on third-party library analysis. We have found > that some widely-used third-party libraries in your project have > major/critical bugs, which will degrade the quality of your project. We > highly recommend you to update those libraries to new versions. > We have attached the buggy third-party libraries and corresponding jira issue > links below for you to have more detailed information. > 1 org.apache.httpcomponents httpclient > (sdks/java/io/elasticsearch/build.gradle,sdks/java/io/solr/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/amazon-web-services/build.gradle) > version: 4.5.6 > Jira issues: > Support relatively new HTTP 308 redirect - RFC7538 > affectsVersions:3.1 (end of life),4.5.6 > > https://issues.apache.org/jira/projects/HTTPCLIENT/issues/HTTPCLIENT-1946?filter=allopenissues > 2 commons-cli commons-cli (release/build.gradle) > version: 1.2 > Jira issues: > Unable to select a pure long option in a group > affectsVersions:1.0;1.1;1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-182?filter=allopenissues > Clear the selection from the groups before parsing > affectsVersions:1.0;1.1;1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-183?filter=allopenissues > Commons CLI incorrectly stripping leading and trailing quotes > affectsVersions:1.1;1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-185?filter=allopenissues > Coding error: OptionGroup.setSelected causes > java.lang.NullPointerException > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-191?filter=allopenissues > StringIndexOutOfBoundsException in HelpFormatter.findWrapPos > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-193?filter=allopenissues > HelpFormatter strips leading whitespaces in the footer > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-207?filter=allopenissues > OptionBuilder only has static methods; yet many return an OptionBuilder > instance > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-224?filter=allopenissues > Unable to properly require options > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-230?filter=allopenissues > OptionValidator Implementation Does Not Agree With JavaDoc > affectsVersions:1.2 > > https://issues.apache.org/jira/projects/CLI/issues/CLI-241?filter=allopenissues > 3 org.apache.logging.log4j log4j-core > (sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/hadoop-input-format/build.gradle,sdks/java/io/hadoop-format/build.gradle) > version: 2.6.2 > Jira issues: > Custom plugins are not loaded; URL protocol vfs is not supported > affectsVersions:2.5;2.6.2 > > https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1320?filter=allopenissues > [OSGi] Missing import package > affectsVersions:2.6.2 > > https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1467?filter=allopenissues > CronTriggeringPolicy raise exception and fail to rollover log file when > evaluateOnStartup is true. > affectsVersions:2.6.2 > > https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1474?filter=allopenissues > Improper header in CsvParameterLayout > affectsVersions:2.6.2 > > https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1482?filter=allopenissues > Merging configurations fail with an NPE when comparing Nodes with > different attributes > affectsVersions:2.6.2 > > https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1500?filter=allopenissues > CsvParameterLayout and CsvLogEventLayout insert NUL character if data > starts with {; (; [ or " > a
[jira] [Created] (BEAM-6383) Your project apache/beam is using buggy third-party libraries [WARNING]
Kaifeng Huang created BEAM-6383: --- Summary: Your project apache/beam is using buggy third-party libraries [WARNING] Key: BEAM-6383 URL: https://issues.apache.org/jira/browse/BEAM-6383 Project: Beam Issue Type: Bug Components: build-system Reporter: Kaifeng Huang Assignee: Luke Cwik Hi, there! We are a research team working on third-party library analysis. We have found that some widely-used third-party libraries in your project have major/critical bugs, which will degrade the quality of your project. We highly recommend you to update those libraries to new versions. We have attached the buggy third-party libraries and corresponding jira issue links below for you to have more detailed information. 1 org.apache.httpcomponents httpclient (sdks/java/io/elasticsearch/build.gradle,sdks/java/io/solr/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/amazon-web-services/build.gradle) version: 4.5.6 Jira issues: Support relatively new HTTP 308 redirect - RFC7538 affectsVersions:3.1 (end of life),4.5.6 https://issues.apache.org/jira/projects/HTTPCLIENT/issues/HTTPCLIENT-1946?filter=allopenissues 2 commons-cli commons-cli (release/build.gradle) version: 1.2 Jira issues: Unable to select a pure long option in a group affectsVersions:1.0;1.1;1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-182?filter=allopenissues Clear the selection from the groups before parsing affectsVersions:1.0;1.1;1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-183?filter=allopenissues Commons CLI incorrectly stripping leading and trailing quotes affectsVersions:1.1;1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-185?filter=allopenissues Coding error: OptionGroup.setSelected causes java.lang.NullPointerException affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-191?filter=allopenissues StringIndexOutOfBoundsException in HelpFormatter.findWrapPos affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-193?filter=allopenissues HelpFormatter strips leading whitespaces in the footer affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-207?filter=allopenissues OptionBuilder only has static methods; yet many return an OptionBuilder instance affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-224?filter=allopenissues Unable to properly require options affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-230?filter=allopenissues OptionValidator Implementation Does Not Agree With JavaDoc affectsVersions:1.2 https://issues.apache.org/jira/projects/CLI/issues/CLI-241?filter=allopenissues 3 org.apache.logging.log4j log4j-core (sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/build.gradle,sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle,sdks/java/io/hadoop-input-format/build.gradle,sdks/java/io/hadoop-format/build.gradle) version: 2.6.2 Jira issues: Custom plugins are not loaded; URL protocol vfs is not supported affectsVersions:2.5;2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1320?filter=allopenissues [OSGi] Missing import package affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1467?filter=allopenissues CronTriggeringPolicy raise exception and fail to rollover log file when evaluateOnStartup is true. affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1474?filter=allopenissues Improper header in CsvParameterLayout affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1482?filter=allopenissues Merging configurations fail with an NPE when comparing Nodes with different attributes affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1500?filter=allopenissues CsvParameterLayout and CsvLogEventLayout insert NUL character if data starts with {; (; [ or " affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1502?filter=allopenissues Unregister JMX ignores log4j2.disable.jmx property affectsVersions:2.6.2 https://issues.apache.org/jira/projects/LOG4J2/issues/LOG4J2-1506?filter=allopenissues DynamicThresholdFilter filters incorrectly when params
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182183 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885263 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ByteBuddyUtils.java ## @@ -595,4 +614,163 @@ protected StackManipulation convertDefault(TypeDescriptor type) { return readValue; } } + + /** + * Invokes a constructor registered using SchemaCreate. As constructor parameters might not be in + * the same order as the schema fields, reorders the parameters as necessary before calling the + * constructor. + */ + static class ConstructorCreateInstruction extends InvokeUserCreateInstruction { +private final Constructor constructor; + +ConstructorCreateInstruction( +List fields, Class targetClass, Constructor constructor) { + super(fields, targetClass, Lists.newArrayList(constructor.getParameters())); + this.constructor = constructor; +} + +@Override +public InstrumentedType prepare(InstrumentedType instrumentedType) { + return instrumentedType; +} + +@Override +protected StackManipulation beforePushingParameters() { + // Create the target class. + ForLoadedType loadedType = new ForLoadedType(targetClass); + return new StackManipulation.Compound(TypeCreation.of(loadedType), Duplication.SINGLE); +} + +@Override +protected StackManipulation afterPushingParameters() { + return MethodInvocation.invoke(new ForLoadedConstructor(constructor)); +} + } + + /** + * Invokes a static factory method registered using SchemaCreate. As the method parameters might + * not be in the same order as the schema fields, reorders the parameters as necessary before + * calling the constructor. + */ + static class StaticFactoryMethodInstruction extends InvokeUserCreateInstruction { +private final Method creator; + +StaticFactoryMethodInstruction( +List fields, Class targetClass, Method creator) { + super(fields, targetClass, Lists.newArrayList(creator.getParameters())); + if (!Modifier.isStatic(creator.getModifiers())) { +throw new IllegalArgumentException("Method " + creator + " is not static"); + } + this.creator = creator; +} + +@Override +public InstrumentedType prepare(InstrumentedType instrumentedType) { + return instrumentedType; +} + +@Override +protected StackManipulation afterPushingParameters() { + return MethodInvocation.invoke(new ForLoadedMethod(creator)); +} + } + + static class InvokeUserCreateInstruction implements Implementation { +protected final List fields; +protected final Class targetClass; +protected final List parameters; +protected final Map fieldMapping; + +protected InvokeUserCreateInstruction( +List fields, Class targetClass, List parameters) { + this.fields = fields; + this.targetClass = targetClass; + this.parameters = parameters; + + // Method parameters might not be in the same order as the schema fields, and the input + // array to SchemaUserTypeCreator.create is in schema order. Examine the parameter names + // and compare against field names to calculate the mapping between the two lists. + Map fieldsByLogicalName = Maps.newHashMap(); + Map fieldsByJavaClassMember = Maps.newHashMap(); + for (int i = 0; i < fields.size(); ++i) { +// Method parameters are allowed to either correspond to the schema field names or to the +// actual Java field or method names. +FieldValueTypeInformation fieldValue = checkNotNull(fields.get(i)); +fieldsByLogicalName.put(fieldValue.getName(), i); +if (fieldValue.getField() != null) { + fieldsByJavaClassMember.put(fieldValue.getField().getName(), i); +} else if (fieldValue.getMethod() != null) { + String name = ReflectUtils.stripPrefix(fieldValue.getMethod().getName(), "set"); + fieldsByJavaClassMember.put(name, i); +} + } + + fieldMapping = Maps.newHashMap(); + for (int i = 0; i < parameters.size(); ++i) { +Parameter parameter = parameters.get(i); +String paramName = parameter.getName(); +Integer index = fieldsByLogicalName.get(paramName); +if (index == null) { + index = fieldsByJavaClassMember.get(paramName); +} +if (index == null) { + throw new RuntimeException( +
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182181 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885244 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java ## @@ -101,6 +104,249 @@ public int hashCode() { public static final Schema NESTED_NULLABLE_SCHEMA = Schema.builder().addNullableField("nested", FieldType.row(NULLABLES_SCHEMA)).build(); + /** A POJO for testing static factory methods. */ + @DefaultSchema(JavaFieldSchema.class) + public static class StaticCreationSimplePojo { +public final String str; +public final byte aByte; +public final short aShort; +public final int anInt; +public final long aLong; +public final boolean aBoolean; +public final DateTime dateTime; +public final Instant instant; +public final byte[] bytes; +public final ByteBuffer byteBuffer; +public final BigDecimal bigDecimal; +public final StringBuilder stringBuilder; + +private StaticCreationSimplePojo( +String str, +byte aByte, +short aShort, +int anInt, +long aLong, +boolean aBoolean, +DateTime dateTime, +Instant instant, +byte[] bytes, +ByteBuffer byteBuffer, +BigDecimal bigDecimal, +StringBuilder stringBuilder) { + this.str = str; + this.aByte = aByte; + this.aShort = aShort; + this.anInt = anInt; + this.aLong = aLong; + this.aBoolean = aBoolean; + this.dateTime = dateTime; + this.instant = instant; + this.bytes = bytes; + this.byteBuffer = byteBuffer; + this.bigDecimal = bigDecimal; + this.stringBuilder = stringBuilder; +} + +@SchemaCreate +public static StaticCreationSimplePojo of( Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182181) Time Spent: 5h (was: 4h 50m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182180 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885227 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ReflectUtils.java ## @@ -77,6 +82,38 @@ public int hashCode() { }); } + @Nullable + public static Constructor getAnnotatedConstructor(Class clazz) { +return Arrays.stream(clazz.getDeclaredConstructors()) +.filter(m -> Modifier.isPublic(m.getModifiers())) +.filter(m -> m.getAnnotation(SchemaCreate.class) != null) +.findFirst() +.orElse(null); + } + + @Nullable + public static Method getAnnotatedCreateMethod(Class clazz) { +return ANNOTATED_CONSTRUCTORS.computeIfAbsent( +clazz, +c -> { + Method method = + Arrays.stream(clazz.getDeclaredMethods()) + .filter(m -> Modifier.isPublic(m.getModifiers())) + .filter(m -> Modifier.isStatic(m.getModifiers())) + .filter(m -> m.getAnnotation(SchemaCreate.class) != null) + .findFirst() + .orElse(null); + if (method != null && !clazz.isAssignableFrom(method.getReturnType())) { Review comment: same as above. it's a bit weird to have two factory methods, but is there a reason to disallow it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182180) Time Spent: 4h 50m (was: 4h 40m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182179 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885222 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/JavaBeanSchema.java ## @@ -47,70 +51,103 @@ /** {@link FieldValueTypeSupplier} that's based on getter methods. */ @VisibleForTesting public static class GetterTypeSupplier implements FieldValueTypeSupplier { +public static final GetterTypeSupplier INSTANCE = new GetterTypeSupplier(); + @Override -public List get(Class clazz, Schema schema) { - Map types = +public List get(Class clazz, @Nullable Schema schema) { + List types = ReflectUtils.getMethods(clazz) .stream() .filter(ReflectUtils::isGetter) + .filter(m -> !m.isAnnotationPresent(SchemaIgnore.class)) .map(FieldValueTypeInformation::forGetter) - .collect(Collectors.toMap(FieldValueTypeInformation::getName, Function.identity())); - // Return the list ordered by the schema fields. - return schema - .getFields() - .stream() - .map(f -> types.get(f.getName())) - .collect(Collectors.toList()); + .map( + t -> { +FieldName fieldName = t.getMethod().getAnnotation(FieldName.class); +return (fieldName != null) ? t.withName(fieldName.value()) : t; + }) + .collect(Collectors.toList()); + return (schema != null) ? StaticSchemaInference.sortBySchema(types, schema) : types; } } /** {@link FieldValueTypeSupplier} that's based on setter methods. */ @VisibleForTesting public static class SetterTypeSupplier implements FieldValueTypeSupplier { +private static final SetterTypeSupplier INSTANCE = new SetterTypeSupplier(); + @Override public List get(Class clazz, Schema schema) { - Map types = + List types = ReflectUtils.getMethods(clazz) .stream() .filter(ReflectUtils::isSetter) + .filter(m -> !m.isAnnotationPresent(SchemaIgnore.class)) .map(FieldValueTypeInformation::forSetter) - .collect(Collectors.toMap(FieldValueTypeInformation::getName, Function.identity())); - // Return the list ordered by the schema fields. - return schema - .getFields() - .stream() - .map(f -> types.get(f.getName())) - .collect(Collectors.toList()); + .collect(Collectors.toList()); + + return (schema != null) ? StaticSchemaInference.sortBySchema(types, schema) : types; } } @Override public Schema schemaFor(TypeDescriptor typeDescriptor) { -return JavaBeanUtils.schemaFromJavaBeanClass(typeDescriptor.getRawType()); +Schema schema = +JavaBeanUtils.schemaFromJavaBeanClass( +typeDescriptor.getRawType(), GetterTypeSupplier.INSTANCE); + +// If there are no creator methods, then validate that we have setters for every field. +// Otherwise, we will have not way of creating the class. +if (ReflectUtils.getAnnotatedCreateMethod(typeDescriptor.getRawType()) == null +&& ReflectUtils.getAnnotatedConstructor(typeDescriptor.getRawType()) == null) { + JavaBeanUtils.validateJavaBean( + GetterTypeSupplier.INSTANCE.get(typeDescriptor.getRawType(), schema), + SetterTypeSupplier.INSTANCE.get(typeDescriptor.getRawType(), schema)); +} +return schema; } @Override public FieldValueGetterFactory fieldValueGetterFactory() { return (Class targetClass, Schema schema) -> -JavaBeanUtils.getGetters(targetClass, schema, new GetterTypeSupplier()); +JavaBeanUtils.getGetters(targetClass, schema, GetterTypeSupplier.INSTANCE); } @Override UserTypeCreatorFactory schemaTypeCreatorFactory() { -return new SetterBasedCreatorFactory(new JavaBeanSetterFactory()); +UserTypeCreatorFactory setterBasedFactory = +new SetterBasedCreatorFactory(new JavaBeanSetterFactory()); + +return (Class targetClass, Schema schema) -> { + // If a static method is marked with @SchemaCreate, use that. + Method annotated = ReflectUtils.getAnnotatedCreateMethod(targetClass); + if (annotated != null) { +return JavaBeanUtils.getStaticCreator( +targetClass, annotated, schema, GetterTypeSupplier.INSTANCE); + } + + //
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182182 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885252 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/utils/TestPOJOs.java ## @@ -101,6 +104,249 @@ public int hashCode() { public static final Schema NESTED_NULLABLE_SCHEMA = Schema.builder().addNullableField("nested", FieldType.row(NULLABLES_SCHEMA)).build(); + /** A POJO for testing static factory methods. */ + @DefaultSchema(JavaFieldSchema.class) + public static class StaticCreationSimplePojo { +public final String str; +public final byte aByte; +public final short aShort; +public final int anInt; +public final long aLong; +public final boolean aBoolean; +public final DateTime dateTime; +public final Instant instant; +public final byte[] bytes; +public final ByteBuffer byteBuffer; +public final BigDecimal bigDecimal; +public final StringBuilder stringBuilder; + +private StaticCreationSimplePojo( +String str, +byte aByte, +short aShort, +int anInt, +long aLong, +boolean aBoolean, +DateTime dateTime, +Instant instant, +byte[] bytes, +ByteBuffer byteBuffer, +BigDecimal bigDecimal, +StringBuilder stringBuilder) { + this.str = str; + this.aByte = aByte; + this.aShort = aShort; + this.anInt = anInt; + this.aLong = aLong; + this.aBoolean = aBoolean; + this.dateTime = dateTime; + this.instant = instant; + this.bytes = bytes; + this.byteBuffer = byteBuffer; + this.bigDecimal = bigDecimal; + this.stringBuilder = stringBuilder; +} + +@SchemaCreate +public static StaticCreationSimplePojo of( +String str, +byte aByte, +short aShort, +int anInt, +long aLong, +boolean aBoolean, +DateTime dateTime, +Instant instant, +byte[] bytes, +ByteBuffer byteBuffer, +BigDecimal bigDecimal, +StringBuilder stringBuilder) { + return new StaticCreationSimplePojo( + str, + aByte, + aShort, + anInt, + aLong, + aBoolean, + dateTime, + instant, + bytes, + byteBuffer, + bigDecimal, + stringBuilder); +} + +@Override +public boolean equals(Object o) { + if (this == o) { +return true; + } + if (!(o instanceof StaticCreationSimplePojo)) { +return false; + } + StaticCreationSimplePojo that = (StaticCreationSimplePojo) o; + return aByte == that.aByte + && aShort == that.aShort + && anInt == that.anInt + && aLong == that.aLong + && aBoolean == that.aBoolean + && Objects.equals(str, that.str) + && Objects.equals(dateTime, that.dateTime) + && Objects.equals(instant, that.instant) + && Arrays.equals(bytes, that.bytes) + && Objects.equals(byteBuffer, that.byteBuffer) + && Objects.equals(bigDecimal, that.bigDecimal) + && Objects.equals(stringBuilder.toString(), that.stringBuilder.toString()); +} + +@Override +public int hashCode() { + int result = + Objects.hash( + str, + aByte, + aShort, + anInt, + aLong, + aBoolean, + dateTime, + instant, + byteBuffer, + bigDecimal, + stringBuilder); + result = 31 * result + Arrays.hashCode(bytes); + return result; +} + } + + /** A POJO for testing annotations. */ + @DefaultSchema(JavaFieldSchema.class) + public static class AnnotatedSimplePojo { +public final String str; + +@FieldName("aByte") +public final byte theByte; + +@FieldName("aShort") +public final short theShort; + +public final int anInt; +public final long aLong; +public final boolean aBoolean; +public final DateTime dateTime; +public final Instant instant; +public final byte[] bytes; +public final ByteBuffer byteBuffer; +public final BigDecimal bigDecimal; +public final StringBuilder stringBuilder; +@SchemaIgnore public final Integer pleaseIgnore; + +// Marked with SchemaCreate, so this will be called to construct instances. +@SchemaCreate +public AnnotatedSimplePo
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182184 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 08/Jan/19 06:02 Start Date: 08/Jan/19 06:02 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245885269 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/FieldValueTypeSupplier.java ## @@ -28,8 +29,10 @@ */ public interface FieldValueTypeSupplier extends Serializable { /** - * Return all the FieldValueTypeInformations. The returned list must be in the same order as + * Return all the FieldValueTypeInformations. + * + * If the schema parameter is not null, then the returned list must be in the same order as * fields in the schema. */ - List get(Class clazz, Schema schema); + List get(Class clazz, @Nullable Schema schema); Review comment: Since every single call site except for one needed this, moving the sorting to all call sites seems undesirable. Instead I split into two separate methods, one taking Schema one not, so that we don't need a @Nullable parameter This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182184) Time Spent: 5.5h (was: 5h 20m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6349) Exceptions (IllegalArgumentException or NoClassDefFoundError) when running tests on Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736767#comment-16736767 ] Boyuan Zhang commented on BEAM-6349: Here are some instructions about how to run non-portable pipeline with customized worker jar: # First, you need to build the worker jar with latest changes by using ./gradlew :beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar # Launch your pipeline with additional options: --dataflowWorkerJar=${the absolute path to the built jar}(usually, it's under the legacy-worker built dir) --workerHarnessContainerImage= > Exceptions (IllegalArgumentException or NoClassDefFoundError) when running > tests on Dataflow runner > --- > > Key: BEAM-6349 > URL: https://issues.apache.org/jira/browse/BEAM-6349 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Lukasz Gajowy >Assignee: Craig Chambers >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Running GroupByKeyLoadTest results in the following error on Dataflow runner: > > {code:java} > java.lang.ExceptionInInitializerError > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) > at > org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Multiple entries with same > key: > kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48 > and > kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294 > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300) > at > org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60) > at > org.apache.beam.runners.dataflow.util.CloudObjects.(CloudObjects.java:39) > ... 15 more > {code} > > Example command to run the tests (FWIW, it also runs the "clean" task > although I don't know if it's necessary): > {code:java} > ./gradlew clean :beam-sdks-java-load-tests:run --info > -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest > -Prunner=:beam-runners-google-cloud-dataflow-java > '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}} > > --stepOptions={"outputRe
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182139 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 08/Jan/19 02:43 Start Date: 08/Jan/19 02:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#issuecomment-452155511 Thanks Melissa. This looks great. LGTM other than few comments. Please feel free to merge after addressing these comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182139) Time Spent: 3h (was: 2h 50m) > Add page for developing I/O connectors for Java > --- > > Key: BEAM-6347 > URL: https://issues.apache.org/jira/browse/BEAM-6347 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Melissa Pashniak >Assignee: Melissa Pashniak >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182143 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 08/Jan/19 02:43 Start Date: 08/Jan/19 02:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245861503 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,372 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. Before you +start, read the +[new I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for an overview of developing a new I/O connector, the available implementation +options, and how to choose the right option for your use case. + +This guide covers using the `Source` and `FileBasedSink` interfaces using Java. +The Python SDK offers the same functionality, but uses a slightly different API. +See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Basic code requirements {#basic-code-reqs} + +Beam runners use the classes you provide to read and/or write data using +multiple worker instances in parallel. As such, the code you provide for +`Source` and `FileBasedSink` subclasses must meet some basic requirements: + + 1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether + bounded or unbounded, must be Serializable. A runner might create multiple + instances of your `Source` or `FileBasedSink` subclass to be sent to + multiple remote workers to facilitate reading or writing in parallel. + + 1. **Immutability:** + Your `Source` or `FileBasedSink` subclass must be effectively immutable. + All private fields must be declared final, and all private variables of + collection type must be effectively immutable. If your class has setter + methods, those methods must return an independent copy of the object with + the relevant field modified. + + You should only use mutable state in your `Source` or `FileBasedSink` + subclass if you are using lazy evaluation of expensive computations that + you need to implement the source or sink; in that case, you must declare + all mutable instance variables transient. + + 1. **Thread-Safety:** Your code must be thread-safe. If you build your source + to work with dynamic work rebalancing, it is critical that you make your + code thread-safe. The Beam SDK provides a helper class to make this easier. + See [Using Your BoundedSource with dynamic work rebalancing](#bounded-dynamic) + for more details. + + 1. **Testability:** It is critical to exhaustively unit test all of your + `Source` and `FileBasedSink` subclasses, especially if you build your + classes to work with advanced features such as dynamic work rebalancing. A + minor implementation error can lead to data corruption or data loss (such + as skipping or duplicating records) that can be hard to detect. + + To assist in testing `BoundedSource` implementations, you can use the + SourceTestUtils class. `SourceTestUtils` contains utilities for automatically + verifying some of the properties of your `BoundedSource` implementation. You + can use `SourceTestUtils` to increase your implementation's test coverage + using a wide range of inputs with relatively few lines of code. For + examples that use `SourceTestUtils`, see the + [AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java) and + [TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java) + source code. + +In addition, see the [PTransform style guide]({{ site.baseurl }}/contribute/ptransform-style-guide/) +for Beam's transform style guidance. + +## Implementing the Source interface + +To create a data source for your pipeline, you must provide the format-specific +logic that tells a runner how to read data from your input source, and how to +split your data source into m
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182141 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 08/Jan/19 02:43 Start Date: 08/Jan/19 02:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245861647 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,372 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. Before you +start, read the +[new I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for an overview of developing a new I/O connector, the available implementation +options, and how to choose the right option for your use case. + +This guide covers using the `Source` and `FileBasedSink` interfaces using Java. +The Python SDK offers the same functionality, but uses a slightly different API. +See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Basic code requirements {#basic-code-reqs} + +Beam runners use the classes you provide to read and/or write data using +multiple worker instances in parallel. As such, the code you provide for +`Source` and `FileBasedSink` subclasses must meet some basic requirements: + + 1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether + bounded or unbounded, must be Serializable. A runner might create multiple + instances of your `Source` or `FileBasedSink` subclass to be sent to + multiple remote workers to facilitate reading or writing in parallel. + + 1. **Immutability:** + Your `Source` or `FileBasedSink` subclass must be effectively immutable. + All private fields must be declared final, and all private variables of + collection type must be effectively immutable. If your class has setter + methods, those methods must return an independent copy of the object with + the relevant field modified. + + You should only use mutable state in your `Source` or `FileBasedSink` + subclass if you are using lazy evaluation of expensive computations that + you need to implement the source or sink; in that case, you must declare + all mutable instance variables transient. + + 1. **Thread-Safety:** Your code must be thread-safe. If you build your source + to work with dynamic work rebalancing, it is critical that you make your + code thread-safe. The Beam SDK provides a helper class to make this easier. + See [Using Your BoundedSource with dynamic work rebalancing](#bounded-dynamic) + for more details. + + 1. **Testability:** It is critical to exhaustively unit test all of your + `Source` and `FileBasedSink` subclasses, especially if you build your + classes to work with advanced features such as dynamic work rebalancing. A + minor implementation error can lead to data corruption or data loss (such + as skipping or duplicating records) that can be hard to detect. + + To assist in testing `BoundedSource` implementations, you can use the + SourceTestUtils class. `SourceTestUtils` contains utilities for automatically + verifying some of the properties of your `BoundedSource` implementation. You + can use `SourceTestUtils` to increase your implementation's test coverage + using a wide range of inputs with relatively few lines of code. For + examples that use `SourceTestUtils`, see the + [AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java) and + [TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java) + source code. + +In addition, see the [PTransform style guide]({{ site.baseurl }}/contribute/ptransform-style-guide/) +for Beam's transform style guidance. + +## Implementing the Source interface + +To create a data source for your pipeline, you must provide the format-specific +logic that tells a runner how to read data from your input source, and how to +split your data source into m
[jira] [Commented] (BEAM-6349) Exceptions (IllegalArgumentException or NoClassDefFoundError) when running tests on Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736648#comment-16736648 ] Boyuan Zhang commented on BEAM-6349: Hey Lukasz, The root cause of this problem is because Craig's PR changed the sdk and dataflow runner harness at the same time but your test is running against the pre-build dataflow runner image(which does not include Craig's changes) with the latest sdk(which includes Craig's sdk part changes). This results in a mismatch issue. For a temporary workaround, you can launch your job with customized dataflow worker jar. > Exceptions (IllegalArgumentException or NoClassDefFoundError) when running > tests on Dataflow runner > --- > > Key: BEAM-6349 > URL: https://issues.apache.org/jira/browse/BEAM-6349 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Lukasz Gajowy >Assignee: Craig Chambers >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Running GroupByKeyLoadTest results in the following error on Dataflow runner: > > {code:java} > java.lang.ExceptionInInitializerError > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) > at > org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Multiple entries with same > key: > kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48 > and > kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294 > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300) > at > org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60) > at > org.apache.beam.runners.dataflow.util.CloudObjects.(CloudObjects.java:39) > ... 15 more > {code} > > Example command to run the tests (FWIW, it also runs the "clean" task > although I don't know if it's necessary): > {code:java} > ./gradlew clean :beam-sdks-java-load-tests:run --info > -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest > -Prunner=:beam-runners-google-cloud-dataflow-java > '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}} > > --stepOptions={"outputRecordsPerInputRecord":1,"pre
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182140 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 08/Jan/19 02:43 Start Date: 08/Jan/19 02:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245861334 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,372 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. Before you +start, read the +[new I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for an overview of developing a new I/O connector, the available implementation +options, and how to choose the right option for your use case. + +This guide covers using the `Source` and `FileBasedSink` interfaces using Java. +The Python SDK offers the same functionality, but uses a slightly different API. +See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Basic code requirements {#basic-code-reqs} + +Beam runners use the classes you provide to read and/or write data using +multiple worker instances in parallel. As such, the code you provide for +`Source` and `FileBasedSink` subclasses must meet some basic requirements: + + 1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether + bounded or unbounded, must be Serializable. A runner might create multiple + instances of your `Source` or `FileBasedSink` subclass to be sent to + multiple remote workers to facilitate reading or writing in parallel. + + 1. **Immutability:** + Your `Source` or `FileBasedSink` subclass must be effectively immutable. + All private fields must be declared final, and all private variables of + collection type must be effectively immutable. If your class has setter + methods, those methods must return an independent copy of the object with + the relevant field modified. + + You should only use mutable state in your `Source` or `FileBasedSink` + subclass if you are using lazy evaluation of expensive computations that + you need to implement the source or sink; in that case, you must declare + all mutable instance variables transient. + + 1. **Thread-Safety:** Your code must be thread-safe. If you build your source + to work with dynamic work rebalancing, it is critical that you make your + code thread-safe. The Beam SDK provides a helper class to make this easier. + See [Using Your BoundedSource with dynamic work rebalancing](#bounded-dynamic) + for more details. + + 1. **Testability:** It is critical to exhaustively unit test all of your + `Source` and `FileBasedSink` subclasses, especially if you build your + classes to work with advanced features such as dynamic work rebalancing. A + minor implementation error can lead to data corruption or data loss (such + as skipping or duplicating records) that can be hard to detect. + + To assist in testing `BoundedSource` implementations, you can use the + SourceTestUtils class. `SourceTestUtils` contains utilities for automatically + verifying some of the properties of your `BoundedSource` implementation. You + can use `SourceTestUtils` to increase your implementation's test coverage + using a wide range of inputs with relatively few lines of code. For + examples that use `SourceTestUtils`, see the + [AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java) and + [TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java) + source code. + +In addition, see the [PTransform style guide]({{ site.baseurl }}/contribute/ptransform-style-guide/) +for Beam's transform style guidance. + +## Implementing the Source interface + +To create a data source for your pipeline, you must provide the format-specific +logic that tells a runner how to read data from your input source, and how to +split your data source into m
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182142 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 08/Jan/19 02:43 Start Date: 08/Jan/19 02:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245862633 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,372 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. Before you +start, read the +[new I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for an overview of developing a new I/O connector, the available implementation +options, and how to choose the right option for your use case. + +This guide covers using the `Source` and `FileBasedSink` interfaces using Java. +The Python SDK offers the same functionality, but uses a slightly different API. +See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Basic code requirements {#basic-code-reqs} + +Beam runners use the classes you provide to read and/or write data using +multiple worker instances in parallel. As such, the code you provide for +`Source` and `FileBasedSink` subclasses must meet some basic requirements: + + 1. **Serializability:** Your `Source` or `FileBasedSink` subclass, whether + bounded or unbounded, must be Serializable. A runner might create multiple + instances of your `Source` or `FileBasedSink` subclass to be sent to + multiple remote workers to facilitate reading or writing in parallel. + + 1. **Immutability:** + Your `Source` or `FileBasedSink` subclass must be effectively immutable. + All private fields must be declared final, and all private variables of + collection type must be effectively immutable. If your class has setter + methods, those methods must return an independent copy of the object with + the relevant field modified. + + You should only use mutable state in your `Source` or `FileBasedSink` + subclass if you are using lazy evaluation of expensive computations that + you need to implement the source or sink; in that case, you must declare + all mutable instance variables transient. + + 1. **Thread-Safety:** Your code must be thread-safe. If you build your source + to work with dynamic work rebalancing, it is critical that you make your + code thread-safe. The Beam SDK provides a helper class to make this easier. + See [Using Your BoundedSource with dynamic work rebalancing](#bounded-dynamic) + for more details. + + 1. **Testability:** It is critical to exhaustively unit test all of your + `Source` and `FileBasedSink` subclasses, especially if you build your + classes to work with advanced features such as dynamic work rebalancing. A + minor implementation error can lead to data corruption or data loss (such + as skipping or duplicating records) that can be hard to detect. + + To assist in testing `BoundedSource` implementations, you can use the + SourceTestUtils class. `SourceTestUtils` contains utilities for automatically + verifying some of the properties of your `BoundedSource` implementation. You + can use `SourceTestUtils` to increase your implementation's test coverage + using a wide range of inputs with relatively few lines of code. For + examples that use `SourceTestUtils`, see the + [AvroSourceTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroSourceTest.java) and + [TextIOReadTest](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java) + source code. + +In addition, see the [PTransform style guide]({{ site.baseurl }}/contribute/ptransform-style-guide/) +for Beam's transform style guidance. + +## Implementing the Source interface + +To create a data source for your pipeline, you must provide the format-specific +logic that tells a runner how to read data from your input source, and how to +split your data source into m
[jira] [Created] (BEAM-6382) SamzaRunner: add an option to read configs using a user-defined factory
Xinyu Liu created BEAM-6382: --- Summary: SamzaRunner: add an option to read configs using a user-defined factory Key: BEAM-6382 URL: https://issues.apache.org/jira/browse/BEAM-6382 Project: Beam Issue Type: Improvement Components: runner-samza Reporter: Xinyu Liu Assignee: Xinyu Liu We need an option to read configs from a factory which is useful in Yarn as well as user-defined file format. By default this config factory is to read property file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4777) Python PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Goenka reassigned BEAM-4777: -- Assignee: Ryan Williams (was: Ankur Goenka) > Python PortableRunner should support metrics > > > Key: BEAM-4777 > URL: https://issues.apache.org/jira/browse/BEAM-4777 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Ryan Williams >Priority: Major > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making Python PortableRunner understand them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Goenka reassigned BEAM-4776: -- Assignee: Ryan Williams (was: Ankur Goenka) > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ryan Williams >Priority: Major > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making PortableRunner understand them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4775) JobService should support returning metrics
[ https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Goenka reassigned BEAM-4775: -- Assignee: Ryan Williams (was: Ankur Goenka) > JobService should support returning metrics > --- > > Key: BEAM-4775 > URL: https://issues.apache.org/jira/browse/BEAM-4775 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Eugene Kirpichov >Assignee: Ryan Williams >Priority: Major > > [https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto] > currently doesn't appear to have a way for JobService to return metrics to a > user, even though > [https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/beam_fn_api.proto] > includes support for reporting SDK metrics to the runner harness. > > Metrics are apparently necessary to run any ValidatesRunner tests because > PAssert needs to validate that the assertions succeeded. However, this > statement should be double-checked: perhaps it's possible to somehow work > with PAssert without metrics support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6362) reduce logging volume on Jenkins pre-commit and post-commit jobs
[ https://issues.apache.org/jira/browse/BEAM-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-6362: --- Assignee: Udi Meiri (was: Luke Cwik) > reduce logging volume on Jenkins pre-commit and post-commit jobs > > > Key: BEAM-6362 > URL: https://issues.apache.org/jira/browse/BEAM-6362 > Project: Beam > Issue Type: Improvement > Components: build-system, testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > Current idea is to remove the --info flags in Jenkins invocations of gradlew. > This should significantly reduce log size, while still logging INFO and above > for failed tests. > Discussion: > https://lists.apache.org/thread.html/ad7cb431af7657522f844a815445ee770fcc7cfd70a2a9ffcd02ec76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6362) reduce logging volume on Jenkins pre-commit and post-commit jobs
[ https://issues.apache.org/jira/browse/BEAM-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-6362. - Resolution: Fixed Fix Version/s: Not applicable > reduce logging volume on Jenkins pre-commit and post-commit jobs > > > Key: BEAM-6362 > URL: https://issues.apache.org/jira/browse/BEAM-6362 > Project: Beam > Issue Type: Improvement > Components: build-system, testing >Reporter: Udi Meiri >Assignee: Luke Cwik >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > Current idea is to remove the --info flags in Jenkins invocations of gradlew. > This should significantly reduce log size, while still logging INFO and above > for failed tests. > Discussion: > https://lists.apache.org/thread.html/ad7cb431af7657522f844a815445ee770fcc7cfd70a2a9ffcd02ec76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6357) Datastore Python client: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-6357: Summary: Datastore Python client: retry on ABORTED rpc responses (was: Datastore clients: retry on ABORTED rpc responses) > Datastore Python client: retry on ABORTED rpc responses > --- > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-6357. - Resolution: Fixed Fix Version/s: 2.11.0 > Datastore clients: retry on ABORTED rpc responses > - > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182100 ] ASF GitHub Bot logged work on BEAM-6357: Author: ASF GitHub Bot Created on: 08/Jan/19 01:00 Start Date: 08/Jan/19 01:00 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #7406: [BEAM-6357] Retry Datastore writes on ABORTED URL: https://github.com/apache/beam/pull/7406#issuecomment-452137183 Thanks, this LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182100) Time Spent: 1h (was: 50m) > Datastore clients: retry on ABORTED rpc responses > - > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError
[ https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736543#comment-16736543 ] Daniel Oliveira commented on BEAM-6355: --- Yes, I think you're right. Following the comments in that doc I was led to BEAM-6349 which looks nearly identical to this issue. > [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] > ExceptionInInitializerError > -- > > Key: BEAM-6355 > URL: https://issues.apache.org/jira/browse/BEAM-6355 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Scott Wegner >Assignee: Boyuan Zhang >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/] > * [Dataflow > Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589] > * [Test source > code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy] > Initial investigation: > It appears the Dataflow job failed during worker initialization. From the > [stackdriver > logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z], > I see: > {code} > 019-01-03 03:08:27.770 PST > Uncaught exception occurred during work unit execution. This will be retried. > Expand all | Collapse all { > insertId: "3832125194122580497:879173:0:62501" > jsonPayload: { > exception: "java.lang.ExceptionInInitializerError > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) > at > org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Multiple entries with same > key: > kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c > and > kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300) > at > org.apache.beam.runners.dataflow.util.CloudObjects.popu
[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182101 ] ASF GitHub Bot logged work on BEAM-6357: Author: ASF GitHub Bot Created on: 08/Jan/19 01:00 Start Date: 08/Jan/19 01:00 Worklog Time Spent: 10m Work Description: charlesccychen commented on pull request #7406: [BEAM-6357] Retry Datastore writes on ABORTED URL: https://github.com/apache/beam/pull/7406 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182101) Time Spent: 1h 10m (was: 1h) > Datastore clients: retry on ABORTED rpc responses > - > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code
[ https://issues.apache.org/jira/browse/BEAM-6280?focusedWorklogId=182087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182087 ] ASF GitHub Bot logged work on BEAM-6280: Author: ASF GitHub Bot Created on: 08/Jan/19 00:41 Start Date: 08/Jan/19 00:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #7433: [BEAM-6280] Refactors Python portability tests to be multi-threaded aware URL: https://github.com/apache/beam/pull/7433 The local_job_service.py module was too complex and had a number of multi-threaded issues. One of the issue was that the Get(Message|State)Stream methods did not guarantee the consumer would see all message/state transitions. Another issue was a data race between messages and state transitions. In the test the PortableRunner would try and wait for any last messages. However, there was a race between receiving the message and the PortableRunner waiting for a terminal state. What might have happened was a job transition into a terminal state and the consumer would not receive the message, failing the test. Note: I wasn't able to repro the bug on my local machine but did my best to zero in on any multi-threading issues. There may be follow-up PRs to fix the issue if it pops up again. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po
[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182094 ] ASF GitHub Bot logged work on BEAM-6357: Author: ASF GitHub Bot Created on: 08/Jan/19 00:46 Start Date: 08/Jan/19 00:46 Worklog Time Spent: 10m Work Description: udim commented on issue #7406: [BEAM-6357] Retry Datastore writes on ABORTED URL: https://github.com/apache/beam/pull/7406#issuecomment-452134721 merge plz: @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182094) Time Spent: 50m (was: 40m) > Datastore clients: retry on ABORTED rpc responses > - > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code
[ https://issues.apache.org/jira/browse/BEAM-6280?focusedWorklogId=182091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182091 ] ASF GitHub Bot logged work on BEAM-6280: Author: ASF GitHub Bot Created on: 08/Jan/19 00:43 Start Date: 08/Jan/19 00:43 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #7433: [BEAM-6280] Refactors Python portability tests to be multi-threaded aware URL: https://github.com/apache/beam/pull/7433#issuecomment-452133990 Gradle scans: [:beam-sdks-python:testPython2](https://scans.gradle.com/s/rdsusz6grxxsi/) [:beam-sdks-python:testPython3](https://scans.gradle.com/s/bcl7pvsnuejby) [:beam-sdks-python:preCommit](https://scans.gradle.com/s/iwkorwpzwwrfs) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182091) Time Spent: 20m (was: 10m) > Failure in PortableRunnerTest.test_error_traceback_includes_user_code > - > > Key: BEAM-6280 > URL: https://issues.apache.org/jira/browse/BEAM-6280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kenneth Knowles >Assignee: Sam Rohde >Priority: Critical > Labels: flaky-test > Time Spent: 20m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/] > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/] > [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3] > {code:java} > 'second' not found in 'Traceback (most recent call last):\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", > line 466, in test_error_traceback_includes_user_code\np | > beam.Create([0]) | beam.Map(first) # pylint: > disable=expression-not-assigned\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py", > line 425, in __exit__\nself.run().wait_until_finish()\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py", > line 314, in wait_until_finish\nself._job_id, self._state, > self._last_error_message()))\nRuntimeError: Pipeline > job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start > coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6357) Datastore clients: retry on ABORTED rpc responses
[ https://issues.apache.org/jira/browse/BEAM-6357?focusedWorklogId=182090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182090 ] ASF GitHub Bot logged work on BEAM-6357: Author: ASF GitHub Bot Created on: 08/Jan/19 00:42 Start Date: 08/Jan/19 00:42 Worklog Time Spent: 10m Work Description: udim commented on issue #7406: [BEAM-6357] Retry Datastore writes on ABORTED URL: https://github.com/apache/beam/pull/7406#issuecomment-452133947 R: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182090) Time Spent: 40m (was: 0.5h) > Datastore clients: retry on ABORTED rpc responses > - > > Key: BEAM-6357 > URL: https://issues.apache.org/jira/browse/BEAM-6357 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > It is recommended to retry on ABORTED error codes: > https://cloud.google.com/datastore/docs/concepts/errors#error_codes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-6352: Affects Version/s: 2.9.0 > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736519#comment-16736519 ] Udi Meiri commented on BEAM-6352: - I can recreate this in 2.9.0, using the Java quickstart instructions and modifying ./src/main/java/org/apache/beam/examples/WordCount.java: ``` import org.apache.beam.sdk.transforms.Watch; import org.joda.time.Duration; import org.apache.beam.sdk.io.FileIO; ``` ``` static void runWordCount(WordCountOptions options) { Pipeline.create(options) .apply( FileIO.match() .filepattern(options.getInputFile()) .continuously(Duration.standardMinutes(1), Watch.Growth.never())); } ``` ``` $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner ... java.lang.IllegalArgumentException: org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement process(ProcessContext, GrowthTracker): Has tracker type Watch.GrowthTracker, but the DoFn's tracker type must be of type RestrictionTracker. at org.apache.beam.sdk.transforms.reflect.DoFnSignatures$ErrorReporter.throwIllegalArgument (DoFnSignatures.java:1507) at org.apache.beam.sdk.transforms.reflect.DoFnSignatures$ErrorReporter.checkArgument (DoFnSignatures.java:1512) at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.verifySplittableMethods (DoFnSignatures.java:593) at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.parseSignature (DoFnSignatures.java:472) at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.lambda$getSignature$0 (DoFnSignatures.java:140) at java.util.HashMap.computeIfAbsent (HashMap.java:1128) at org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getSignature (DoFnSignatures.java:140) at org.apache.beam.sdk.transforms.ParDo.validate (ParDo.java:546) at org.apache.beam.sdk.transforms.ParDo.of (ParDo.java:393) at org.apache.beam.sdk.transforms.Watch$Growth.expand (Watch.java:689) at org.apache.beam.sdk.transforms.Watch$Growth.expand (Watch.java:157) at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537) at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:488) at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:370) at org.apache.beam.sdk.io.FileIO$MatchAll.expand (FileIO.java:614) at org.apache.beam.sdk.io.FileIO$MatchAll.expand (FileIO.java:572) at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537) at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:488) at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:370) at org.apache.beam.sdk.io.FileIO$Match.expand (FileIO.java:567) at org.apache.beam.sdk.io.FileIO$Match.expand (FileIO.java:514) at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:537) at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:471) at org.apache.beam.sdk.values.PBegin.apply (PBegin.java:44) at org.apache.beam.sdk.Pipeline.apply (Pipeline.java:167) at org.apache.beam.examples.WordCount.runWordCount (WordCount.java:178) at org.apache.beam.examples.WordCount.main (WordCount.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) ``` > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleF
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182073 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 07/Jan/19 22:52 Start Date: 07/Jan/19 22:52 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245825968 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/FieldName.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.annotations; + +import java.lang.annotation.Documented; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; +import javax.annotation.CheckForNull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; + +/** + * When used on a POJO field or a JavaBean getter, the specified name is used for the generated + * schema field. + * + * For example, a Java POJO with a field that we want in our schema but under a different name. + * + * + * {@literal @}DefaultSchema(JavaBeanSchema.class) + * class MyClass { + * public String user; + * {@literal @}SchemaName("age") Review comment: Went with SchemaFieldName. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182073) Time Spent: 4.5h (was: 4h 20m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182072 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 07/Jan/19 22:51 Start Date: 07/Jan/19 22:51 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245825698 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/SchemaCreate.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.annotations; + +import java.lang.annotation.Documented; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; + +/** + * Can be put on a constructor or a static method, in which case that constructor or method will be + * used to created instance of the class by Beam's schema code. + * + * For example, the following Java POJO. + * + * + * {@literal @}DefaultSchema(JavaBeanSchema.class) + * class MyClass { + *public final String user; + *public final int age; + * + *{@literal @}SchemaCreate + *public MyClass(String user, int age) { + * this.user = user; + * this.age = age; + *} + * } + * + * + * This tells Beam that this constructor can be used to construct instances. Beam will match up + * the names of the constructor arguments to schema fields in order to decide how to create the + * class from a Row. + * + * This can also be used to annotate a static factory method on the class. For example: + * + * + * {@literal @}DefaultSchema(JavaBeanSchema.class) + * class MyClass { + * public final String user; + * public final int age; + * + * private MyClass(String user, int age) { this.user = user; this.age = age; } + * + * {@literal @}SchemaCreate + * public MyClass create(String user, int age) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182072) Time Spent: 4h 20m (was: 4h 10m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182071 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 07/Jan/19 22:50 Start Date: 07/Jan/19 22:50 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245825617 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/FieldName.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.annotations; + +import java.lang.annotation.Documented; +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; +import javax.annotation.CheckForNull; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Experimental.Kind; + +/** + * When used on a POJO field or a JavaBean getter, the specified name is used for the generated + * schema field. + * + * For example, a Java POJO with a field that we want in our schema but under a different name. + * + * + * {@literal @}DefaultSchema(JavaBeanSchema.class) + * class MyClass { + * public String user; + * {@literal @}SchemaName("age") + * public int ageInYears; + * } + * + * + * The resulting schema will have fields named "user" and "age." + */ +@Documented +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.FIELD, ElementType.METHOD}) +@SuppressWarnings("rawtypes") +@Experimental(Kind.SCHEMAS) +public @interface FieldName { + @CheckForNull Review comment: Good point. Our build assumes everything is @Nonnull unless explicitly marked @Nullable, so I'll just delete the annotation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182071) Time Spent: 4h 10m (was: 4h) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=182070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182070 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 07/Jan/19 22:49 Start Date: 07/Jan/19 22:49 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#discussion_r245825310 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/annotations/DefaultSchema.java ## @@ -97,7 +100,7 @@ private SchemaProvider getSchemaProvider(TypeDescriptor typeDescriptor) { + " specified as the default SchemaProvider for type " + type Review comment: This is a TypeDescriptor which doesn't have a getName. Are you suggesting type.getRawType().getName()? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182070) Time Spent: 4h (was: 3h 50m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support
[ https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736449#comment-16736449 ] Ahmed El.Hussaini commented on BEAM-6241: - [~kenn] appreciate your immediate response and help. Will do what I can on my end in the meantime. > MongoDbIO - Add Limit and Aggregates Support > > > Key: BEAM-6241 > URL: https://issues.apache.org/jira/browse/BEAM-6241 > Project: Beam > Issue Type: Improvement > Components: io-java-mongodb >Affects Versions: 2.9.0 >Reporter: Ahmed El.Hussaini >Assignee: Ahmed El.Hussaini >Priority: Major > Labels: easyfix > Fix For: 2.10.0 > > > h2. Adds Support to Limit Results > > {code:java} > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withFilter("{\"scientist\":\"Einstein\"}") > .withLimit(5));{code} > h2. Adds Support to Use Aggregates > > {code:java} > List aggregates = new ArrayList(); > aggregates.add( > new BsonDocument( > "$match", > new BsonDocument("country", new BsonDocument("$eq", new > BsonString("England"); > PCollection output = > pipeline.apply( > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withAggregate(aggregates)); > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182066 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 07/Jan/19 22:41 Start Date: 07/Jan/19 22:41 Worklog Time Spent: 10m Work Description: melap commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245524376 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,419 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. See the [new +I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for a general overview of developing a new I/O connector. + +This page describes implementation details for developing sources and sinks +using Java. The Python SDK offers the same functionality, but uses a slightly +different API. See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Implementation options + +**Sources** + +For bounded (batch) sources, there are currently two options for creating a Beam +source: + +1. Use `ParDo` and `GroupByKey`. +2. Use the `Source` interface and extend the `BoundedSource` abstract subclass. + +`ParDo` is the recommended option, as implementing a `Source` can be tricky. +The [developing I/O connectors overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +covers using `ParDo`, and lists some use cases where you might want to use a +Source (such as [dynamic work rebalancing]({{ site.baseurl }}/blog/2016/05/18/splitAtFraction-method.html)). + +For unbounded (streaming) sources, you must use the `Source` interface and extend +the `UnboundedSource` abstract subclass. `UnboundedSource` supports features that +are useful for streaming pipelines such as checkpointing. + +Splittable DoFn is a new sources framework that is under development and will +replace the other options for developing bounded and unbounded sources. For more +information, see the +[roadmap for multi-SDK connector efforts]({{ site.baseurl }}/roadmap/connectors-multi-sdk/). + +**Sinks** + +To create a Beam sink, we recommend that you use a single `ParDo` that writes the Review comment: Worked this into start of FilebasedSink section This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182066) > Add page for developing I/O connectors for Java > --- > > Key: BEAM-6347 > URL: https://issues.apache.org/jira/browse/BEAM-6347 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Melissa Pashniak >Assignee: Melissa Pashniak >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182065 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 07/Jan/19 22:41 Start Date: 07/Jan/19 22:41 Worklog Time Spent: 10m Work Description: melap commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245524353 ## File path: website/src/documentation/io/developing-io-java.md ## @@ -0,0 +1,419 @@ +--- +layout: section +title: "Apache Beam: Developing I/O connectors for Java" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-java/ +redirect_from: /documentation/io/authoring-java/ +--- + +# Developing I/O connectors for Java + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector that usually consist of a +source and a sink. All Beam sources and sinks are composite transforms; however, +the implementation of your custom I/O depends on your use case. See the [new +I/O connector overview]({{ site.baseurl }}/documentation/io/developing-io-overview/) +for a general overview of developing a new I/O connector. + +This page describes implementation details for developing sources and sinks +using Java. The Python SDK offers the same functionality, but uses a slightly +different API. See [Developing I/O connectors for Python]({{ site.baseurl }}/documentation/io/developing-io-python/) +for information specific to the Python SDK. + +## Implementation options + +**Sources** + +For bounded (batch) sources, there are currently two options for creating a Beam +source: + +1. Use `ParDo` and `GroupByKey`. Review comment: Done - I worked in a redirect sentence into the intro and removed this whole Implementation options section This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182065) Time Spent: 2h 40m (was: 2.5h) > Add page for developing I/O connectors for Java > --- > > Key: BEAM-6347 > URL: https://issues.apache.org/jira/browse/BEAM-6347 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Melissa Pashniak >Assignee: Melissa Pashniak >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6347) Add page for developing I/O connectors for Java
[ https://issues.apache.org/jira/browse/BEAM-6347?focusedWorklogId=182067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182067 ] ASF GitHub Bot logged work on BEAM-6347: Author: ASF GitHub Bot Created on: 07/Jan/19 22:41 Start Date: 07/Jan/19 22:41 Worklog Time Spent: 10m Work Description: melap commented on pull request #7397: [BEAM-6347] Add website page for developing I/O connectors for Java URL: https://github.com/apache/beam/pull/7397#discussion_r245788401 ## File path: website/src/documentation/io/developing-io-overview.md ## @@ -0,0 +1,171 @@ +--- +layout: section +title: "Overview: Developing a new I/O connector" +section_menu: section-menu/documentation.html +permalink: /documentation/io/developing-io-overview/ +redirect_from: /documentation/io/authoring-overview/ +--- + + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Overview: Developing a new I/O connector + +_A guide for users who need to connect to a data store that isn't supported by +the [Built-in I/O connectors]({{site.baseurl }}/documentation/io/built-in/)_ + +To connect to a data store that isn’t supported by Beam’s existing I/O +connectors, you must create a custom I/O connector. A connector usually consists +of a source and a sink. All Beam sources and sinks are composite transforms; +however, the implementation of your custom I/O depends on your use case. Here +are the recommended steps to get started: + +1. Read this overview and choose your implementation. You can email the + [Beam dev mailing list]({{ site.baseurl }}/get-started/support) with any + questions you might have. In addition, you can check if anyone else is + working on the same I/O connector. + +1. If you plan to contribute your I/O connector to the Beam community, see the + [Apache Beam contribution guide]({{ site.baseurl }}/contribute/contribution-guide/). + +1. Read the [PTransform style guide]({{ site.baseurl }}/contribute/ptransform-style-guide/) + for additional style guide recommendations. + + +## Implementation options + +### Sources + +For bounded (batch) sources, there are currently two options for creating a Beam +source: + +1. Use `ParDo` and `GroupByKey`. + +1. Use the `Source` interface and extend the `BoundedSource` abstract subclass. + +`ParDo` is the recommended option, as implementing a `Source` can be tricky. See +[When to use the Source interface](#when-to-use-source) for a list of some use +cases where you might want to use a `Source` (such as +[dynamic work rebalancing]({{ site.baseurl }}/blog/2016/05/18/splitAtFraction-method.html)). + +(Java only) For unbounded (streaming) sources, you must use the `Source` +interface and extend the `UnboundedSource` abstract subclass. `UnboundedSource` +supports features that are useful for streaming pipelines, such as +checkpointing. + +Splittable DoFn is a new sources framework that is under development and will +replace the other options for developing bounded and unbounded sources. For more +information, see the +[roadmap for multi-SDK connector efforts]({{ site.baseurl }}/roadmap/connectors-multi-sdk/). + +### When to use the Source interface {#when-to-use-source} Review comment: Rearranged, the Implementation options section seems unneeded now so I removed it. the sections were so small under Sink that I rewrote it to match the style of the Source section (For X, use Y) and incorporated the text from your next comment This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182067) Time Spent: 2h 50m (was: 2h 40m) > Add page for developing I/O connectors for Java > --- > > Key: BEAM-6347 > URL: https://issues.apache.org/jira/browse/BEAM-6347 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Melissa Pashniak >Assignee: Melissa Pashniak >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6254) Add a "Learning Resources" page
[ https://issues.apache.org/jira/browse/BEAM-6254?focusedWorklogId=182064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182064 ] ASF GitHub Bot logged work on BEAM-6254: Author: ASF GitHub Bot Created on: 07/Jan/19 22:39 Start Date: 07/Jan/19 22:39 Worklog Time Spent: 10m Work Description: melap commented on issue #7303: [BEAM-6254] Add a Learning Resources page to the website URL: https://github.com/apache/beam/pull/7303#issuecomment-452107648 Run Website_Stage_GCS PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182064) Time Spent: 1.5h (was: 1h 20m) > Add a "Learning Resources" page > --- > > Key: BEAM-6254 > URL: https://issues.apache.org/jira/browse/BEAM-6254 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: David Cavazos >Assignee: David Cavazos >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6231) Triage test failures introduced by use_executable_stage_bundle_execution
[ https://issues.apache.org/jira/browse/BEAM-6231?focusedWorklogId=182059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182059 ] ASF GitHub Bot logged work on BEAM-6231: Author: ASF GitHub Bot Created on: 07/Jan/19 22:09 Start Date: 07/Jan/19 22:09 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #7356: [BEAM-6231] Make Dataflow runner harness work with FixedWindow URL: https://github.com/apache/beam/pull/7356#issuecomment-452099745 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182059) Time Spent: 4h (was: 3h 50m) > Triage test failures introduced by use_executable_stage_bundle_execution > > > Key: BEAM-6231 > URL: https://issues.apache.org/jira/browse/BEAM-6231 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support
[ https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736422#comment-16736422 ] Kenneth Knowles commented on BEAM-6241: --- Sorry for the delay. Mentioning someone on the PR is good, if you can see who might be interested (like through {{git blame}}) and another way is to ask on d...@beam.apache.org. But generally we hope to get to PRs so they are not left sitting without review. In this case, I am going to send mail to the list about all issues targeting 2.10.0 where there is a PR that is sitting for review. > MongoDbIO - Add Limit and Aggregates Support > > > Key: BEAM-6241 > URL: https://issues.apache.org/jira/browse/BEAM-6241 > Project: Beam > Issue Type: Improvement > Components: io-java-mongodb >Affects Versions: 2.9.0 >Reporter: Ahmed El.Hussaini >Assignee: Ahmed El.Hussaini >Priority: Major > Labels: easyfix > Fix For: 2.10.0 > > > h2. Adds Support to Limit Results > > {code:java} > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withFilter("{\"scientist\":\"Einstein\"}") > .withLimit(5));{code} > h2. Adds Support to Use Aggregates > > {code:java} > List aggregates = new ArrayList(); > aggregates.add( > new BsonDocument( > "$match", > new BsonDocument("country", new BsonDocument("$eq", new > BsonString("England"); > PCollection output = > pipeline.apply( > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withAggregate(aggregates)); > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6363) Upgrade Gearpump to 0.9.0
[ https://issues.apache.org/jira/browse/BEAM-6363?focusedWorklogId=182049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182049 ] ASF GitHub Bot logged work on BEAM-6363: Author: ASF GitHub Bot Created on: 07/Jan/19 21:47 Start Date: 07/Jan/19 21:47 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7425: [BEAM-6363] Upgrade Gearpump to 0.9.0 URL: https://github.com/apache/beam/pull/7425 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182049) Time Spent: 0.5h (was: 20m) > Upgrade Gearpump to 0.9.0 > - > > Key: BEAM-6363 > URL: https://issues.apache.org/jira/browse/BEAM-6363 > Project: Beam > Issue Type: Improvement > Components: runner-gearpump >Affects Versions: 2.9.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the first release after Gearpump retired from Apache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6241) MongoDbIO - Add Limit and Aggregates Support
[ https://issues.apache.org/jira/browse/BEAM-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736411#comment-16736411 ] Ahmed El.Hussaini commented on BEAM-6241: - [~kenn] what can I do here or who should I mention to get this issue, and the associated PR reviewed? > MongoDbIO - Add Limit and Aggregates Support > > > Key: BEAM-6241 > URL: https://issues.apache.org/jira/browse/BEAM-6241 > Project: Beam > Issue Type: Improvement > Components: io-java-mongodb >Affects Versions: 2.9.0 >Reporter: Ahmed El.Hussaini >Assignee: Ahmed El.Hussaini >Priority: Major > Labels: easyfix > Fix For: 2.10.0 > > > h2. Adds Support to Limit Results > > {code:java} > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withFilter("{\"scientist\":\"Einstein\"}") > .withLimit(5));{code} > h2. Adds Support to Use Aggregates > > {code:java} > List aggregates = new ArrayList(); > aggregates.add( > new BsonDocument( > "$match", > new BsonDocument("country", new BsonDocument("$eq", new > BsonString("England"); > PCollection output = > pipeline.apply( > MongoDbIO.read() > .withUri("mongodb://localhost:" + port) > .withDatabase(DATABASE) > .withCollection(COLLECTION) > .withAggregate(aggregates)); > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5964) Add ClickHouseIO.Write
[ https://issues.apache.org/jira/browse/BEAM-5964?focusedWorklogId=182048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182048 ] ASF GitHub Bot logged work on BEAM-5964: Author: ASF GitHub Bot Created on: 07/Jan/19 21:44 Start Date: 07/Jan/19 21:44 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7006: [BEAM-5964] Add ClickHouseIO.Write URL: https://github.com/apache/beam/pull/7006#issuecomment-452092826 @kanterov Yes, feel free to propose a cherrypick PR. This was clearly pretty much done when the release proposal thread started. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182048) Time Spent: 8h 40m (was: 8.5h) > Add ClickHouseIO.Write > -- > > Key: BEAM-5964 > URL: https://issues.apache.org/jira/browse/BEAM-5964 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > h3. Motivation > ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data > that is updated in real time. The project was released as open-source > software under the Apache 2 license in June 2016. > h3. Design and implementation > 1. Do only writes, reads aren't useful because ClickHouse is designed for > OLAP queries > 2. For writes, do write in batches and rely on idempotent and atomic inserts > supported by replicated tables in ClickHouse > 3. Implement ClickHouseIO.Write as PTransform, PDone> > 4. Rely on having logic for casting rows between schemas in BEAM-5918, and > don't put it in ClickHouseIO.Write > h3. References > [1] > http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html > [2] > https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/ > [3] > https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6381) :beam-test-infra-metrics:test org.apache.beam.testinfra.metrics.ProberTests > CheckGrafanaStalenessAlerts failed
Boyuan Zhang created BEAM-6381: -- Summary: :beam-test-infra-metrics:test org.apache.beam.testinfra.metrics.ProberTests > CheckGrafanaStalenessAlerts failed Key: BEAM-6381 URL: https://issues.apache.org/jira/browse/BEAM-6381 Project: Beam Issue Type: Test Components: test-failures Reporter: Boyuan Zhang Assignee: Scott Wegner https://builds.apache.org/job/beam_Release_NightlySnapshot/295/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError
[ https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736373#comment-16736373 ] Boyuan Zhang commented on BEAM-6355: Feel like it's related to [https://github.com/apache/beam/pull/7351] > [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] > ExceptionInInitializerError > -- > > Key: BEAM-6355 > URL: https://issues.apache.org/jira/browse/BEAM-6355 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Scott Wegner >Assignee: Boyuan Zhang >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/] > * [Dataflow > Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589] > * [Test source > code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy] > Initial investigation: > It appears the Dataflow job failed during worker initialization. From the > [stackdriver > logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z], > I see: > {code} > 019-01-03 03:08:27.770 PST > Uncaught exception occurred during work unit execution. This will be retried. > Expand all | Collapse all { > insertId: "3832125194122580497:879173:0:62501" > jsonPayload: { > exception: "java.lang.ExceptionInInitializerError > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) > at > org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Multiple entries with same > key: > kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c > and > kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300) > at > org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60) > at > org.
[jira] [Commented] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736360#comment-16736360 ] Ahmet Altay commented on BEAM-6380: --- Error is: *22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/gcsio.py", line 581, in _start_upload*22:01:22* self._client.objects.Insert(self._insert_request, upload=self._upload)*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1154, in Insert*22:01:22* upload=upload, upload_config=upload_config)*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/base_api.py", line 715, in _RunMethod*22:01:22* http_request, client=self.client)*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py", line 876, in InitializeUpload*22:01:22* return self.StreamInChunks()*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py", line 988, in StreamInChunks*22:01:22* additional_headers=additional_headers)*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py", line 939, in __StreamMedia*22:01:22* self.RefreshResumableUploadState()*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/apitools/base/py/transfer.py", line 841, in RefreshResumableUploadState*22:01:22* self.stream.seek(self.progress)*22:01:22* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/filesystemio.py", line 264, in seek*22:01:22* raise NotImplementedError*22:01:22* NotImplementedError It seems like this is a missing feature in Beam. When an upload needs to resume and this is fails, apitools tries to seek to an earlier point in the uploadable file (https://github.com/google/apitools/blob/effc2e576427d8c876b1d64c79edcd98ab433074/apitools/base/py/transfer.py#L850), however Beam filesystem does not support this (https://github.com/apache/beam/blob/d02b859cb37e3c9f565785e93384905f1078b409/sdks/python/apache_beam/io/filesystemio.py#L264). We can either update filesystemio to support this case, or add enable bundle retrying capability for the direct runner. > apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed > --- > > Key: BEAM-6380 > URL: https://issues.apache.org/jira/browse/BEAM-6380 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Ahmet Altay >Priority: Major > > wordcount test in :pythonPostCommit failed owing to RuntimeError: > NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] > > https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182036 ] ASF GitHub Bot logged work on BEAM-6326: Author: ASF GitHub Bot Created on: 07/Jan/19 20:52 Start Date: 07/Jan/19 20:52 Worklog Time Spent: 10m Work Description: mxm commented on issue #7432: [BEAM-6326] Relax test assumption for PAssertTest#testWindowedIsEqualTo URL: https://github.com/apache/beam/pull/7432#issuecomment-452077732 This doesn't seem to be the right way, at least the DirectRunner behaves differently. Will have to materialize this correctly for the side input then. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182036) Time Spent: 0.5h (was: 20m) > Fix test failures in streaming mode of PortableValidatesRunner > -- > > Key: BEAM-6326 > URL: https://issues.apache.org/jira/browse/BEAM-6326 > Project: Beam > Issue Type: Test > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > As of BEAM-6009, the tests are run separately for batch and streaming. This > has revealed issues with a couple of tests which need to be addressed. > The Gradle task is: {{validatesPortableRunnerStreaming}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
Boyuan Zhang created BEAM-6380: -- Summary: apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed Key: BEAM-6380 URL: https://issues.apache.org/jira/browse/BEAM-6380 Project: Beam Issue Type: Test Components: test-failures Reporter: Boyuan Zhang Assignee: Ahmet Altay wordcount test in :pythonPostCommit failed owing to RuntimeError: NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5958) [Flake][beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle] VR tests flaking with stuck tests.
[ https://issues.apache.org/jira/browse/BEAM-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736326#comment-16736326 ] Daniel Oliveira commented on BEAM-5958: --- I disagree on this issue being a duplicate. That said it should still be closed, since the original issue doesn't seem to appear in the updated test target. None of the recent failures I checked exhibit this error. [https://builds.apache.org/view/All/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/] > [Flake][beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle] > VR tests flaking with stuck tests. > --- > > Key: BEAM-5958 > URL: https://issues.apache.org/jira/browse/BEAM-5958 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > > Some of the VR tests in this target occasionally get stuck and run for over > an hour, and end up killed due to timing out. > Recent run: > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle/20/] > The line indicating the failure looks like this: > {noformat} > java.lang.RuntimeException: > Workflow failed. Causes: The Dataflow job appears to be stuck because no > worker activity has been seen in the last 1h. You can get help with Cloud > Dataflow at https://cloud.google.com/dataflow/support.{noformat} > [https://scans.gradle.com/s/jtbvh7kerp3eq/tests/wc3tcgdeth5bc-i6d7ckdlc55ws] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6341) Python VR failure: test_flatten_multiple_pcollections_having_multiple_consumers
[ https://issues.apache.org/jira/browse/BEAM-6341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736318#comment-16736318 ] Daniel Oliveira commented on BEAM-6341: --- The out of memory error doesn't seem to have popped up again recently. I think that one's unrelated. The timeout flakes are common though, and happen in multiple different tests not just test_flatten_multiple_pcollections_having_multiple_consumers. I'll keep looking into it. > Python VR failure: > test_flatten_multiple_pcollections_having_multiple_consumers > --- > > Key: BEAM-6341 > URL: https://issues.apache.org/jira/browse/BEAM-6341 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Daniel Oliveira >Priority: Major > > [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/2154/] > Workflow failed. Causes: The Dataflow job appears to be stuck because no > worker activity has been seen in the last 1h. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6345) BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker
[ https://issues.apache.org/jira/browse/BEAM-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736302#comment-16736302 ] Boyuan Zhang commented on BEAM-6345: https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/605/ > BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker > - > > Key: BEAM-6345 > URL: https://issues.apache.org/jira/browse/BEAM-6345 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > > https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/576/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6355) [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] ExceptionInInitializerError
[ https://issues.apache.org/jira/browse/BEAM-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736301#comment-16736301 ] Daniel Oliveira commented on BEAM-6355: --- I once saw a similar issue that was caused by a shading issue, where a class was being vendored but the code calling it wasn't calling the vendored version. That time the bug was caused because some of the artifacts built weren't rebuilt after the shading change was pulled in, so the fix was as simple as cleaning the entire build environment and rebuilding from scratch. I don't know how this PostRelease test is built, but if it tries to reuse artifacts instead of doing a clean build then that might be related. > [beam_PostRelease_NightlySnapshot] [runQuickstartJavaDataflow] > ExceptionInInitializerError > -- > > Key: BEAM-6355 > URL: https://issues.apache.org/jira/browse/BEAM-6355 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Scott Wegner >Assignee: Boyuan Zhang >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/480/] > * [Dataflow > Job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-01-03_03_07_12-9648275706628895259?project=apache-beam-testing&folder&organizationId=433637338589] > * [Test source > code|https://github.com/apache/beam/blob/master/release/src/main/groovy/quickstart-java-dataflow.groovy] > Initial investigation: > It appears the Dataflow job failed during worker initialization. From the > [stackdriver > logs|https://console.cloud.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-03_03_07_12-9648275706628895259&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker&interval=NO_LIMIT&project=apache-beam-testing&folder&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-03T16:52:14.64300Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2019-01-03T11:08:49.88200Z], > I see: > {code} > 019-01-03 03:08:27.770 PST > Uncaught exception occurred during work unit execution. This will be retried. > Expand all | Collapse all { > insertId: "3832125194122580497:879173:0:62501" > jsonPayload: { > exception: "java.lang.ExceptionInInitializerError > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) > at > org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) > at > org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) > at > org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Multiple entries with same > key: > kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@f4dd50c > and > kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@ae1551d > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) > at > org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100) >
[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182018 ] ASF GitHub Bot logged work on BEAM-6326: Author: ASF GitHub Bot Created on: 07/Jan/19 20:02 Start Date: 07/Jan/19 20:02 Worklog Time Spent: 10m Work Description: mxm commented on issue #7432: [BEAM-6326] Relax test assumption for PAssertTest#testWindowedIsEqualTo URL: https://github.com/apache/beam/pull/7432#issuecomment-452063257 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182018) Time Spent: 20m (was: 10m) > Fix test failures in streaming mode of PortableValidatesRunner > -- > > Key: BEAM-6326 > URL: https://issues.apache.org/jira/browse/BEAM-6326 > Project: Beam > Issue Type: Test > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > As of BEAM-6009, the tests are run separately for batch and streaming. This > has revealed issues with a couple of tests which need to be addressed. > The Gradle task is: {{validatesPortableRunnerStreaming}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5975) [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors relating to runMobileGamingJavaDirect
[ https://issues.apache.org/jira/browse/BEAM-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira resolved BEAM-5975. --- Resolution: Cannot Reproduce Fix Version/s: Not applicable > [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors > relating to runMobileGamingJavaDirect > --- > > Key: BEAM-5975 > URL: https://issues.apache.org/jira/browse/BEAM-5975 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Fix For: Not applicable > > > beam_PostRelease_NightlySnapshot has failed two nights in a row due to issues > relating to runMobileGamingJavaDirect. It's possible that these are two > separate bugs since the failures are different, but I am grouping them > together because they happened on consecutive nights and very closely to each > other in the test logs, so it seems likely they're related. > > [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/421/] > In the first failure the test times out while executing > runMobileGamingJavaDirect: > {noformat} > 04:40:03 Waiting on bqjob_r2bb92817cbd78f51_0166debcb7da_1 ... (0s) > Current status: DONE > 04:40:03 +---+ > 04:40:03 | table_id | > 04:40:03 +---+ > 04:40:03 | hourly_team_score_python_dataflow | > 04:40:03 | hourly_team_score_python_direct | > 04:40:03 | leaderboard_DataflowRunner_team | > 04:40:03 | leaderboard_DataflowRunner_user | > 04:40:03 | leaderboard_DirectRunner_team | > 04:40:03 | leaderboard_DirectRunner_user | > 04:40:03 +---+ > 04:40:06 bq query SELECT table_id FROM > beam_postrelease_mobile_gaming.__TABLES_SUMMARY__ > 04:40:06 Build timed out (after 100 minutes). Marking the build as aborted. > 04:40:06 Build was aborted > 04:40:06 > 04:40:08 Finished: ABORTED{noformat} > > [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/422/] > In the next day's run the test fails due to an issue building a class while > building runMobileGamingJavaDirect: > {noformat} > [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ > word-count-beam --- > [INFO] Changes detected - recompiling the module! > [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. > build is platform dependent! > [INFO] Compiling 31 source files to > /tmp/groovy-generated-7468741840914398994-tmpdir/word-count-beam/target/classes > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java > (default-cli) on project word-count-beam: An exception occured while > executing the Java class. org.apache.beam.examples.complete.game.LeaderBoard > -> [Help 1] > [ERROR]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5975) [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors relating to runMobileGamingJavaDirect
[ https://issues.apache.org/jira/browse/BEAM-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736279#comment-16736279 ] Daniel Oliveira commented on BEAM-5975: --- I looked through the available build history for beam_PostRelease_NightlySnapshot, and it doesn't seem to be having this error anywhere anymore. There is an error in runMobileGamingJavaDataflow, but it seems completely unrelated, so I'll create a new bug for that if one doesn't exist. > [beam_PostRelease_NightlySnapshot] Nightly has been failing due to errors > relating to runMobileGamingJavaDirect > --- > > Key: BEAM-5975 > URL: https://issues.apache.org/jira/browse/BEAM-5975 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > beam_PostRelease_NightlySnapshot has failed two nights in a row due to issues > relating to runMobileGamingJavaDirect. It's possible that these are two > separate bugs since the failures are different, but I am grouping them > together because they happened on consecutive nights and very closely to each > other in the test logs, so it seems likely they're related. > > [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/421/] > In the first failure the test times out while executing > runMobileGamingJavaDirect: > {noformat} > 04:40:03 Waiting on bqjob_r2bb92817cbd78f51_0166debcb7da_1 ... (0s) > Current status: DONE > 04:40:03 +---+ > 04:40:03 | table_id | > 04:40:03 +---+ > 04:40:03 | hourly_team_score_python_dataflow | > 04:40:03 | hourly_team_score_python_direct | > 04:40:03 | leaderboard_DataflowRunner_team | > 04:40:03 | leaderboard_DataflowRunner_user | > 04:40:03 | leaderboard_DirectRunner_team | > 04:40:03 | leaderboard_DirectRunner_user | > 04:40:03 +---+ > 04:40:06 bq query SELECT table_id FROM > beam_postrelease_mobile_gaming.__TABLES_SUMMARY__ > 04:40:06 Build timed out (after 100 minutes). Marking the build as aborted. > 04:40:06 Build was aborted > 04:40:06 > 04:40:08 Finished: ABORTED{noformat} > > [https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/422/] > In the next day's run the test fails due to an issue building a class while > building runMobileGamingJavaDirect: > {noformat} > [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ > word-count-beam --- > [INFO] Changes detected - recompiling the module! > [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. > build is platform dependent! > [INFO] Compiling 31 source files to > /tmp/groovy-generated-7468741840914398994-tmpdir/word-count-beam/target/classes > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java > (default-cli) on project word-count-beam: An exception occured while > executing the Java class. org.apache.beam.examples.complete.game.LeaderBoard > -> [Help 1] > [ERROR]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=182017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182017 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 07/Jan/19 20:02 Start Date: 07/Jan/19 20:02 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on pull request #7394: [BEAM-5315] Python 3 port io.textio module URL: https://github.com/apache/beam/pull/7394#discussion_r245780402 ## File path: sdks/python/apache_beam/io/filebasedsource_test.py ## @@ -87,6 +87,22 @@ class EOL(object): def write_data( num_lines, no_data=False, directory=None, prefix=tempfile.template, eol=EOL.LF): + """Writes test data to a temporary file. + + Args: +num_lines (int): The number of lines to write. +no_data (bool): If :data:`True`, empty lines will be written, otherwise + each line will contain a concatenation of b'line' and the line number. +directory (str): The name of the directory to create the temporary file in. +prefix (str): The prefix to use for the temporary file. +eol (int): The line ending to use when writing. + :class:`~apache_beam.io.textio_test.EOL` exposes attributes that can be + used here to define the eol. + + Returns: +Tuple[str, List[str]): A tuple of the filename and a list of the written Review comment: You're right. Fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182017) Time Spent: 12h 10m (was: 12h) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6326) Fix test failures in streaming mode of PortableValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-6326?focusedWorklogId=182016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182016 ] ASF GitHub Bot logged work on BEAM-6326: Author: ASF GitHub Bot Created on: 07/Jan/19 20:02 Start Date: 07/Jan/19 20:02 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7432: [BEAM-6326] Relax test assumption for PAssertTest#testWindowedIsEqualTo URL: https://github.com/apache/beam/pull/7432 The Flink Runner returns a Pane which is early because it just assigns the Windows to the test values. At this point without further transformations, e.g. GroupByKey, it does not know that this is the only Pane. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 182016) Time Spent: 10m Remaining Estimate: 0h > Fix test failures in streaming mode of PortableValidatesRunner > -- > > Key: BEAM-6326 > URL: https://issues.apache.org/jira/browse/BEAM-6326 > Project: Beam > Issue Type: Test > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 10m > Remaining Estimat
[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input
[ https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181997 ] ASF GitHub Bot logged work on BEAM-6258: Author: ASF GitHub Bot Created on: 07/Jan/19 19:48 Start Date: 07/Jan/19 19:48 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7415: [BEAM-6258] Set grpc keep alive on server creation URL: https://github.com/apache/beam/pull/7415#discussion_r245776335 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java ## @@ -144,7 +145,9 @@ private static Server createServer(List services, InetSocketAdd NettyServerBuilder.forPort(socket.getPort()) // Set the message size to max value here. The actual size is governed by the // buffer size in the layers above. - .maxMessageSize(Integer.MAX_VALUE); + .maxMessageSize(Integer.MAX_VALUE) + .permitKeepAliveTime(1, TimeUnit.SECONDS) + .permitKeepAliveWithoutCalls(true); Review comment: What if we set the keep alive time explicitly on both ends? Just wondering whether it made sense to set the values instead of using this workaround. https://github.com/grpc/grpc/blob/master/doc/keepalive.md#defaults-values This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181997) Time Spent: 1h (was: 50m) > Data channel failing after some time for 1G data input > -- > > Key: BEAM-6258 > URL: https://issues.apache.org/jira/browse/BEAM-6258 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Data channel and logging channel are failing after some time with 1GB input > data for chicago taxi. > > E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with > error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings" > Exception in thread read_grpc_client_inputs: > Traceback (most recent call last): > File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner > self.run() > File "/usr/local/lib/python2.7/threading.py", line 754, in run > self.__target(*self.__args, **self.__kwargs) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 273, in > target=lambda: self._read_inputs(elements_iterator), > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)> > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 145, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 180, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 253, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 269, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 481, in process_bundle > instruction_id, expected_targets): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 209, in input_elements > raise_(t, v, tb) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (
[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start
[ https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181991 ] ASF GitHub Bot logged work on BEAM-6346: Author: ASF GitHub Bot Created on: 07/Jan/19 19:44 Start Date: 07/Jan/19 19:44 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7395: [BEAM-6346] Inspect the docker container state URL: https://github.com/apache/beam/pull/7395#discussion_r245775105 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ## @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep LOG.debug("Created Docker Container with Container ID {}", containerId); // Wait on a client from the gRPC server. while (instructionHandler == null) { +Preconditions.checkArgument( +docker.isContainerRunning(containerId), "No container running for id " + containerId); Review comment: The check is too strict then. We won't get `RUNNING` directly after starting the container. We need something like a repeated check which times out after some max startup time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181991) Time Spent: 1h (was: 50m) > Portable Flink Job hangs if the sdk harness fails to start > -- > > Key: BEAM-6346 > URL: https://issues.apache.org/jira/browse/BEAM-6346 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Portable beam job on Flink fails if the sdk harness container fails to start > as its started in detached mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6248) Add Flink 1.7.x build target to Flink Runner
[ https://issues.apache.org/jira/browse/BEAM-6248?focusedWorklogId=181983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181983 ] ASF GitHub Bot logged work on BEAM-6248: Author: ASF GitHub Bot Created on: 07/Jan/19 19:34 Start Date: 07/Jan/19 19:34 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7300: [BEAM-6248] Add Flink v1.7 build target to Flink Runner URL: https://github.com/apache/beam/pull/7300#discussion_r245772111 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/PortableStateExecutionTest.java ## @@ -86,6 +86,7 @@ public void testExecution() throws Exception { options.setRunner(CrashingRunner.class); options.as(FlinkPipelineOptions.class).setFlinkMaster("[local]"); options.as(FlinkPipelineOptions.class).setStreaming(isStreaming); +options.as(FlinkPipelineOptions.class).setParallelism(4); Review comment: To give a meaningful signal, the test should probably be different/specialized and assert something about the parallel execution? If not, then we can probably just set it to 1 ? As it stands we are operating on hope :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181983) Time Spent: 4h (was: 3h 50m) > Add Flink 1.7.x build target to Flink Runner > > > Key: BEAM-6248 > URL: https://issues.apache.org/jira/browse/BEAM-6248 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 4h > Remaining Estimate: 0h > > With BEAM-5419 we can add a Flink 1.7.x build target. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string
[ https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181982 ] ASF GitHub Bot logged work on BEAM-3580: Author: ASF GitHub Bot Created on: 07/Jan/19 19:30 Start Date: 07/Jan/19 19:30 Worklog Time Spent: 10m Work Description: htyleo commented on issue #7422: [BEAM-3580] Use a separate coder for string URL: https://github.com/apache/beam/pull/7422#issuecomment-452053094 > These coders represent "standard" beam coders as defined by the FnAPI, so we can't simply add to them. An alternative would be to have the String Coder be a custom coder instead, that we "install" by default, similar to how we're currently handling Protocol Buffers or JSON. > > This will need to be tested against the Universal Local Runner, to verify the semantics. Thanks @lostluck for correcting me. I'm going to close this PR and file another one, which should implement a custom coder (instead of a built-in coder) for string. I will try testing it against the Universal Local Runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181982) Time Spent: 1h 10m (was: 1h) > Do not use Go BytesCoder to encode string > - > > Key: BEAM-3580 > URL: https://issues.apache.org/jira/browse/BEAM-3580 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Henning Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > We should not use the same built-in coder for two different types. It creates > the need for conversions at inopportune times in the runtime. > > One option would be to a custom coder that shares encoding with bytes, given > that bytes are length prefixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string
[ https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181981 ] ASF GitHub Bot logged work on BEAM-3580: Author: ASF GitHub Bot Created on: 07/Jan/19 19:30 Start Date: 07/Jan/19 19:30 Worklog Time Spent: 10m Work Description: htyleo commented on pull request #7422: [BEAM-3580] Use a separate coder for string URL: https://github.com/apache/beam/pull/7422 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181981) Time Spent: 1h (was: 50m) > Do not use Go BytesCoder to encode string > - > > Key: BEAM-3580 > URL: https://issues.apache.org/jira/browse/BEAM-3580 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Henning Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > We should not use the same built-in coder for two different types. It creates > the need for conversions at inopportune times in the runtime. > > One option would be to a custom coder that shares encoding with bytes, given > that bytes are length prefixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=181973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181973 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 07/Jan/19 19:18 Start Date: 07/Jan/19 19:18 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #7394: [BEAM-5315] Python 3 port io.textio module URL: https://github.com/apache/beam/pull/7394#discussion_r245767467 ## File path: sdks/python/apache_beam/io/filebasedsource_test.py ## @@ -87,6 +87,22 @@ class EOL(object): def write_data( num_lines, no_data=False, directory=None, prefix=tempfile.template, eol=EOL.LF): + """Writes test data to a temporary file. + + Args: +num_lines (int): The number of lines to write. +no_data (bool): If :data:`True`, empty lines will be written, otherwise + each line will contain a concatenation of b'line' and the line number. +directory (str): The name of the directory to create the temporary file in. +prefix (str): The prefix to use for the temporary file. +eol (int): The line ending to use when writing. + :class:`~apache_beam.io.textio_test.EOL` exposes attributes that can be + used here to define the eol. + + Returns: +Tuple[str, List[str]): A tuple of the filename and a list of the written Review comment: Thanks. I wish we had more docstrings like this one throughout SDK internals. Tuple[str, List[bytes]] in Line 103? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181973) Time Spent: 12h (was: 11h 50m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start
[ https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181969 ] ASF GitHub Bot logged work on BEAM-6346: Author: ASF GitHub Bot Created on: 07/Jan/19 19:10 Start Date: 07/Jan/19 19:10 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #7395: [BEAM-6346] Inspect the docker container state URL: https://github.com/apache/beam/pull/7395#discussion_r245765042 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ## @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep LOG.debug("Created Docker Container with Container ID {}", containerId); // Wait on a client from the gRPC server. while (instructionHandler == null) { +Preconditions.checkArgument( +docker.isContainerRunning(containerId), "No container running for id " + containerId); Review comment: The main issue that I observed here was when the container failed to start. As it is started in detached mode, we don't know the state of the container and simply wait on the control client to connect. Synchronous container start will not work and will be very difficult to orchestrate. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181969) Time Spent: 50m (was: 40m) > Portable Flink Job hangs if the sdk harness fails to start > -- > > Key: BEAM-6346 > URL: https://issues.apache.org/jira/browse/BEAM-6346 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Portable beam job on Flink fails if the sdk harness container fails to start > as its started in detached mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input
[ https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181971 ] ASF GitHub Bot logged work on BEAM-6258: Author: ASF GitHub Bot Created on: 07/Jan/19 19:13 Start Date: 07/Jan/19 19:13 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #7415: [BEAM-6258] Set grpc keep alive on server creation URL: https://github.com/apache/beam/pull/7415#discussion_r245763524 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java ## @@ -144,7 +145,9 @@ private static Server createServer(List services, InetSocketAdd NettyServerBuilder.forPort(socket.getPort()) // Set the message size to max value here. The actual size is governed by the // buffer size in the layers above. - .maxMessageSize(Integer.MAX_VALUE); + .maxMessageSize(Integer.MAX_VALUE) + .permitKeepAliveTime(1, TimeUnit.SECONDS) + .permitKeepAliveWithoutCalls(true); Review comment: We are not setting the client configuration anywhere. However, I also feel that this might be a server/client configuration issues. This is just a work around till we find the root cause. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181971) Time Spent: 50m (was: 40m) > Data channel failing after some time for 1G data input > -- > > Key: BEAM-6258 > URL: https://issues.apache.org/jira/browse/BEAM-6258 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Data channel and logging channel are failing after some time with 1GB input > data for chicago taxi. > > E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with > error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings" > Exception in thread read_grpc_client_inputs: > Traceback (most recent call last): > File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner > self.run() > File "/usr/local/lib/python2.7/threading.py", line 754, in run > self.__target(*self.__args, **self.__kwargs) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 273, in > target=lambda: self._read_inputs(elements_iterator), > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)> > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 145, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 180, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 253, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 269, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 481, in process_bundle > instruction_id, expected_targets): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 209, in input_elements > raise_(t, v, tb) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (StatusCode.RESOURCE_EXHAUSTED, GOAWAY
[jira] [Work logged] (BEAM-6258) Data channel failing after some time for 1G data input
[ https://issues.apache.org/jira/browse/BEAM-6258?focusedWorklogId=181966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181966 ] ASF GitHub Bot logged work on BEAM-6258: Author: ASF GitHub Bot Created on: 07/Jan/19 19:05 Start Date: 07/Jan/19 19:05 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #7415: [BEAM-6258] Set grpc keep alive on server creation URL: https://github.com/apache/beam/pull/7415#discussion_r245763524 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java ## @@ -144,7 +145,9 @@ private static Server createServer(List services, InetSocketAdd NettyServerBuilder.forPort(socket.getPort()) // Set the message size to max value here. The actual size is governed by the // buffer size in the layers above. - .maxMessageSize(Integer.MAX_VALUE); + .maxMessageSize(Integer.MAX_VALUE) + .permitKeepAliveTime(1, TimeUnit.SECONDS) + .permitKeepAliveWithoutCalls(true); Review comment: We are not checking the client configuration anywhere. However, I also feel that this might be a server/client configuration issues. This is just a work around till we find the root cause. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181966) Time Spent: 40m (was: 0.5h) > Data channel failing after some time for 1G data input > -- > > Key: BEAM-6258 > URL: https://issues.apache.org/jira/browse/BEAM-6258 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Data channel and logging channel are failing after some time with 1GB input > data for chicago taxi. > > E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with > error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings" > Exception in thread read_grpc_client_inputs: > Traceback (most recent call last): > File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner > self.run() > File "/usr/local/lib/python2.7/threading.py", line 754, in run > self.__target(*self.__args, **self.__kwargs) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 273, in > target=lambda: self._read_inputs(elements_iterator), > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)> > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 145, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 180, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 253, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 269, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 481, in process_bundle > instruction_id, expected_targets): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 209, in input_elements > raise_(t, v, tb) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 260, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in > next > return self._next() > File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in > _next > raise self > _Rendezvous: <_Rendezvous of RPC that terminated with > (StatusCode.RESOURCE_EXHAUSTED, GOAW
[jira] [Commented] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736201#comment-16736201 ] Robert Burke commented on BEAM-6372: Off the top of my head, I think I agree with the assessment, the error message would ideally be very clear about what's missing and possibly why. One reason it hasn't been done this way is that for testing with ptest and `go test`, it's inconvenient to have the registration scaffolding everywhere. In particular we'd probably want to maintain that ease, but still have the semantics check everything correctly, likely with a --test flag that's used by default in the testing harness. > Direct Runner should marshal data in a similar way to Dataflow runner > - > > Key: BEAM-6372 > URL: https://issues.apache.org/jira/browse/BEAM-6372 > Project: Beam > Issue Type: Improvement > Components: runner-direct, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > I would test my pipeline using the direct runner, and it would happily run on > a sample. When I ran it on the Dataflow runner, it'll run for a hour, then > get to a stage that would crash like so: > > {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction -224: execute failed: panic: > reflect: Call using main.HistogramResult as type struct \{ Key string > "json:\"key\""; Files []string "json:\"files\""; Histogram > palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R > uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 > [running]:{quote} > This was because I forgot to register my HistogramResult type. > It would be useful if the direct runner tried to marshal and unmarshal all > types, to help expose issues like this earlier. > Also, when running on Dataflow, the value of flags, and captured variables, > would be the empty/default value. It would be good if direct also caused this > behaviour. For example: > {code} > prefix := “X” > s = s.Scope(“Prefix ” + prefix) > c = beam.ParDo(s, func(value string) string { > return prefix + value > }, c) > {code} > Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. > Subtle behaviour, but I suspect the direct runner could expose this if it > marshalled and unmarshalled the func like the dataflow runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3580) Do not use Go BytesCoder to encode string
[ https://issues.apache.org/jira/browse/BEAM-3580?focusedWorklogId=181970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181970 ] ASF GitHub Bot logged work on BEAM-3580: Author: ASF GitHub Bot Created on: 07/Jan/19 19:10 Start Date: 07/Jan/19 19:10 Worklog Time Spent: 10m Work Description: htyleo commented on pull request #7422: [BEAM-3580] Use a separate coder for string URL: https://github.com/apache/beam/pull/7422#discussion_r245765155 ## File path: sdks/go/pkg/beam/core/graph/coder/coder.go ## @@ -122,6 +122,7 @@ type Kind string const ( CustomKind = "Custom" // Implicitly length-prefixed Bytes Kind = "bytes" // Implicitly length-prefixed as part of the encoding + StringKind = "string" // Implicitly length-prefixed as part of the encoding Review comment: Oh, thanks for pointing that out! I misunderstood the meaning of "custom coder" in the bug description. I'm going to close this PR and file another one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181970) Time Spent: 50m (was: 40m) > Do not use Go BytesCoder to encode string > - > > Key: BEAM-3580 > URL: https://issues.apache.org/jira/browse/BEAM-3580 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Henning Rohde >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We should not use the same built-in coder for two different types. It creates > the need for conversions at inopportune times in the runtime. > > One option would be to a custom coder that shares encoding with bytes, given > that bytes are length prefixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6346) Portable Flink Job hangs if the sdk harness fails to start
[ https://issues.apache.org/jira/browse/BEAM-6346?focusedWorklogId=181968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181968 ] ASF GitHub Bot logged work on BEAM-6346: Author: ASF GitHub Bot Created on: 07/Jan/19 19:10 Start Date: 07/Jan/19 19:10 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #7395: [BEAM-6346] Inspect the docker container state URL: https://github.com/apache/beam/pull/7395#discussion_r245765030 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ## @@ -152,8 +152,10 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep LOG.debug("Created Docker Container with Container ID {}", containerId); // Wait on a client from the gRPC server. while (instructionHandler == null) { +Preconditions.checkArgument( +docker.isContainerRunning(containerId), "No container running for id " + containerId); try { - instructionHandler = clientSource.take(workerId, Duration.ofMinutes(2)); + instructionHandler = clientSource.take(workerId, Duration.ofSeconds(30)); Review comment: 1 Min sounds good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181968) Time Spent: 40m (was: 0.5h) > Portable Flink Job hangs if the sdk harness fails to start > -- > > Key: BEAM-6346 > URL: https://issues.apache.org/jira/browse/BEAM-6346 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Portable beam job on Flink fails if the sdk harness container fails to start > as its started in detached mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-3306: -- Assignee: Robert Burke > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Minor > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code
[ https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736178#comment-16736178 ] Boyuan Zhang commented on BEAM-6280: Feel like it's related to python3 since all tests failed on python3 project. [~tvalentyn] could you please help take a look at this one? > Failure in PortableRunnerTest.test_error_traceback_includes_user_code > - > > Key: BEAM-6280 > URL: https://issues.apache.org/jira/browse/BEAM-6280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kenneth Knowles >Assignee: Sam Rohde >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/] > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/] > [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3] > {code:java} > 'second' not found in 'Traceback (most recent call last):\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", > line 466, in test_error_traceback_includes_user_code\np | > beam.Create([0]) | beam.Map(first) # pylint: > disable=expression-not-assigned\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py", > line 425, in __exit__\nself.run().wait_until_finish()\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py", > line 314, in wait_until_finish\nself._job_id, self._state, > self._last_error_message()))\nRuntimeError: Pipeline > job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start > coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6345) BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker
[ https://issues.apache.org/jira/browse/BEAM-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736172#comment-16736172 ] Boyuan Zhang commented on BEAM-6345: https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Cron/253/ > BigtableWriteIT. testE2EBigtableWrite got stuck with portable dataflow worker > - > > Key: BEAM-6345 > URL: https://issues.apache.org/jira/browse/BEAM-6345 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > > https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/576/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code
[ https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736180#comment-16736180 ] Sam Rohde commented on BEAM-6280: - [~boyuanz] I appreciate the help, but I've narrowed the bug down to a data race in the local_job_service. A fix is forthcoming > Failure in PortableRunnerTest.test_error_traceback_includes_user_code > - > > Key: BEAM-6280 > URL: https://issues.apache.org/jira/browse/BEAM-6280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kenneth Knowles >Assignee: Sam Rohde >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/] > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/] > [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3] > {code:java} > 'second' not found in 'Traceback (most recent call last):\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", > line 466, in test_error_traceback_includes_user_code\np | > beam.Create([0]) | beam.Map(first) # pylint: > disable=expression-not-assigned\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py", > line 425, in __exit__\nself.run().wait_until_finish()\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py", > line 314, in wait_until_finish\nself._job_id, self._state, > self._last_error_message()))\nRuntimeError: Pipeline > job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start > coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code
[ https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736176#comment-16736176 ] Boyuan Zhang commented on BEAM-6280: https://builds.apache.org/job/beam_PreCommit_Python_Cron/794/ > Failure in PortableRunnerTest.test_error_traceback_includes_user_code > - > > Key: BEAM-6280 > URL: https://issues.apache.org/jira/browse/BEAM-6280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kenneth Knowles >Assignee: Sam Rohde >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/] > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/] > [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3] > {code:java} > 'second' not found in 'Traceback (most recent call last):\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", > line 466, in test_error_traceback_includes_user_code\np | > beam.Create([0]) | beam.Map(first) # pylint: > disable=expression-not-assigned\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py", > line 425, in __exit__\nself.run().wait_until_finish()\n File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py", > line 314, in wait_until_finish\nself._job_id, self._state, > self._last_error_message()))\nRuntimeError: Pipeline > job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start > coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181962 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 07/Jan/19 18:49 Start Date: 07/Jan/19 18:49 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272#issuecomment-452039844 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181962) Time Spent: 4h 50m (was: 4h 40m) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181960 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 07/Jan/19 18:48 Start Date: 07/Jan/19 18:48 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272#issuecomment-452039639 Seems to be an issue starting up the VMs, due to not getting the built dataflow container. E Handler for GET /v1.27/images/us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654/json returned error: No such image: us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654 undefined E Error syncing pod ab3a90f13f7e8f68b0508d2fbee851d2 ("dataflow-testpipeline-jenkins-0105-01041719-00vb-harness-18hs_default(ab3a90f13f7e8f68b0508d2fbee851d2)"), skipping: failed to "StartContainer" for "sdk" with ImagePullBackOff: "Back-off pulling image \"us.gcr.io/apache-beam-testing/java-postcommit-it/java:20190105011654\"" undefined This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181960) Time Spent: 4.5h (was: 4h 20m) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=181961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181961 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 07/Jan/19 18:48 Start Date: 07/Jan/19 18:48 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272#issuecomment-452039766 https://pantheon.corp.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2019-01-04_17_19_42-763666818691945485&interval=NO_LIMIT&project=apache-beam-testing&e=13803970%2C13803205%2C13804191%2C13804201%2C13804391%2C13807341&organizationId=433637338589&minLogLevel=500&expandAll=false×tamp=2019-01-07T18%3A46%3A56.72400Z&customFacets&limitCustomFacetWidth=true&scrollTimestamp=2019-01-05T02%3A31%3A53.793816000Z&advancedFilter=resource.type%3D%22dataflow_step%22%0Aresource.labels.region%3D%22us-central1%22%0Aresource.labels.job_name%3D%22testpipeline-jenkins-0105011930-95e3af97%22%0Aresource.labels.job_id%3D%222019-01-04_17_19_42-763666818691945485%22%0Aresource.labels.project_id%3D%22apache-beam-testing%22%0Atimestamp%3D%222019-01-05T02%3A31%3A53.792919067Z%22%0AinsertId%3D%22s%3D0c43aa38f1dc473b84d3c4a5aae74b06%3Bi%3Db7f%3Bb%3D21484f6efe6c4a7598aaf3520458d553%3Bm%3Dd4c0a671%3Bt%3D57eaccc7d520d%3Bx%3De456feb0b9171076%22 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181961) Time Spent: 4h 40m (was: 4.5h) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-3306: --- Description: Add coder registry to allow easier overwrite of default coders. We may also allow otherwise un-encodable types, but that would require that function analysis depends on it. If we're hardcoding support for proto/avro, then there may be little need for such a feature. Conversely, this may be how we implement such support. Proposal Doc: [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] was: Add coder registry to allow easier overwrite of default coders. We may also allow otherwise un-encodable types, but that would require that function analysis depends on it. If we're hardcoding support for proto/avro, then there may be little need for such a feature. Conversely, this may be how we implement such support. > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Priority: Minor > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736154#comment-16736154 ] Robert Burke commented on BEAM-3306: Added proposal doc link to description. https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit# > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Priority: Minor > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6379) Research and introduce thread sanitizer code into the java SDK, and java runner harnesses
[ https://issues.apache.org/jira/browse/BEAM-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736143#comment-16736143 ] Alex Amato commented on BEAM-6379: -- One open source thread santizier, though there is little documentation and it was committed several years ago. https://github.com/google/java-thread-sanitizer > Research and introduce thread sanitizer code into the java SDK, and java > runner harnesses > - > > Key: BEAM-6379 > URL: https://issues.apache.org/jira/browse/BEAM-6379 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6376) Java PreCommit failed continually owing to OOM on beam6
[ https://issues.apache.org/jira/browse/BEAM-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736122#comment-16736122 ] Boyuan Zhang commented on BEAM-6376: [https://builds.apache.org/job/beam_PreCommit_Java_Cron/794/] https://builds.apache.org/job/beam_PreCommit_Java_Cron/795/ > Java PreCommit failed continually owing to OOM on beam6 > --- > > Key: BEAM-6376 > URL: https://issues.apache.org/jira/browse/BEAM-6376 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Alan Myrvold >Priority: Major > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/788/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/789/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/790/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/791/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/792/] > https://builds.apache.org/job/beam_PreCommit_Java_Cron/793/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6379) Research and introduce thread sanitizer code into the java SDK, and java runner harnesses
Alex Amato created BEAM-6379: Summary: Research and introduce thread sanitizer code into the java SDK, and java runner harnesses Key: BEAM-6379 URL: https://issues.apache.org/jira/browse/BEAM-6379 Project: Beam Issue Type: New Feature Components: java-fn-execution Reporter: Alex Amato -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736128#comment-16736128 ] Alex Amato commented on BEAM-6375: -- I will be visiting the metrics work in the SDKs and making sure the metrics are calculated properly. Though, We will be focusing on Java and Python in the current quarter. Some of them require some clever techniques to perform measurements. > Walltime seems incorrect with the Go SDK and Dataflow > - > > Key: BEAM-6375 > URL: https://issues.apache.org/jira/browse/BEAM-6375 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The pipeline takes hours to run, but the walltime for each step is in > seconds. For example, this pipeline took 22 hours, but no stage too more than > 3 seconds: https://pasteboard.co/HVf8PCp.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736129#comment-16736129 ] Alex Amato commented on BEAM-6374: -- I will be visiting the metrics work in the SDKs and making sure the metrics are calculated properly. Though, We will be focusing on Java and Python in the current quarter. Some of them require some clever techniques to perform measurements. > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6372: -- Assignee: Robert Burke > Direct Runner should marshal data in a similar way to Dataflow runner > - > > Key: BEAM-6372 > URL: https://issues.apache.org/jira/browse/BEAM-6372 > Project: Beam > Issue Type: Improvement > Components: runner-direct, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > I would test my pipeline using the direct runner, and it would happily run on > a sample. When I ran it on the Dataflow runner, it'll run for a hour, then > get to a stage that would crash like so: > > {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction -224: execute failed: panic: > reflect: Call using main.HistogramResult as type struct \{ Key string > "json:\"key\""; Files []string "json:\"files\""; Histogram > palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R > uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 > [running]:{quote} > This was because I forgot to register my HistogramResult type. > It would be useful if the direct runner tried to marshal and unmarshal all > types, to help expose issues like this earlier. > Also, when running on Dataflow, the value of flags, and captured variables, > would be the empty/default value. It would be good if direct also caused this > behaviour. For example: > {code} > prefix := “X” > s = s.Scope(“Prefix ” + prefix) > c = beam.ParDo(s, func(value string) string { > return prefix + value > }, c) > {code} > Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. > Subtle behaviour, but I suspect the direct runner could expose this if it > marshalled and unmarshalled the func like the dataflow runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=181928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181928 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 07/Jan/19 17:45 Start Date: 07/Jan/19 17:45 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289#issuecomment-452019113 There is similar code working with reflection in Jackson, didn't have time to look deep into it. https://github.com/FasterXML/jackson-databind/blob/master/src/main/java/com/fasterxml/jackson/databind/introspect/JacksonAnnotationIntrospector.java#L291 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181928) Time Spent: 3h 50m (was: 3h 40m) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6373) Make Go SDK on Dataflow multi-threaded
[ https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736082#comment-16736082 ] Robert Burke commented on BEAM-6373: Without SplitableDoFn (SDF) , Go SDK jobs are unable to scale and divide work, which includes any kind of Parallelism. This is being worked on this quarter. See [https://beam.apache.org/roadmap/go-sdk] for additional information. > Make Go SDK on Dataflow multi-threaded > -- > > Key: BEAM-6373 > URL: https://issues.apache.org/jira/browse/BEAM-6373 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > When running on dataflow, with a machine with more than 1 V-CPU, the worker > will never use more than 1 core of CPU. I presume this is because the > Dataflow runner does not allow the worker to run multiple steps concurrently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6373) Make Go SDK on Dataflow multi-threaded
[ https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6373: -- Assignee: Tyler Akidau (was: Robert Burke) > Make Go SDK on Dataflow multi-threaded > -- > > Key: BEAM-6373 > URL: https://issues.apache.org/jira/browse/BEAM-6373 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Tyler Akidau >Priority: Major > > When running on dataflow, with a machine with more than 1 V-CPU, the worker > will never use more than 1 core of CPU. I presume this is because the > Dataflow runner does not allow the worker to run multiple steps concurrently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736080#comment-16736080 ] Robert Burke commented on BEAM-6374: As with https://issues.apache.org/jira/browse/BEAM-6375, this is because the SDK doesn't currently record and communicate these values to the runner. [~ajam...@google.com] will be working on this in the next few months. Running the Go SDK on Dataflow is unofficial and unsupported at this time for this and splittable dofn support. > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6373) Make Go SDK on Dataflow multi-threaded
[ https://issues.apache.org/jira/browse/BEAM-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6373: -- Assignee: Robert Burke (was: Tyler Akidau) > Make Go SDK on Dataflow multi-threaded > -- > > Key: BEAM-6373 > URL: https://issues.apache.org/jira/browse/BEAM-6373 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > When running on dataflow, with a machine with more than 1 V-CPU, the worker > will never use more than 1 core of CPU. I presume this is because the > Dataflow runner does not allow the worker to run multiple steps concurrently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6372: -- Assignee: (was: Robert Burke) > Direct Runner should marshal data in a similar way to Dataflow runner > - > > Key: BEAM-6372 > URL: https://issues.apache.org/jira/browse/BEAM-6372 > Project: Beam > Issue Type: Improvement > Components: runner-direct, sdk-go >Reporter: Andrew Brampton >Priority: Major > > I would test my pipeline using the direct runner, and it would happily run on > a sample. When I ran it on the Dataflow runner, it'll run for a hour, then > get to a stage that would crash like so: > > {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction -224: execute failed: panic: > reflect: Call using main.HistogramResult as type struct \{ Key string > "json:\"key\""; Files []string "json:\"files\""; Histogram > palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R > uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 > [running]:{quote} > This was because I forgot to register my HistogramResult type. > It would be useful if the direct runner tried to marshal and unmarshal all > types, to help expose issues like this earlier. > Also, when running on Dataflow, the value of flags, and captured variables, > would be the empty/default value. It would be good if direct also caused this > behaviour. For example: > {code} > prefix := “X” > s = s.Scope(“Prefix ” + prefix) > c = beam.ParDo(s, func(value string) string { > return prefix + value > }, c) > {code} > Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. > Subtle behaviour, but I suspect the direct runner could expose this if it > marshalled and unmarshalled the func like the dataflow runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736071#comment-16736071 ] Robert Burke commented on BEAM-6375: I suspect this has to do with that ParDos don't currently communicate work estimates and similar to the runner. IIRC [~ajam...@google.com] is intending to work on that in the next few months as part of updating Portable Metrics handling. > Walltime seems incorrect with the Go SDK and Dataflow > - > > Key: BEAM-6375 > URL: https://issues.apache.org/jira/browse/BEAM-6375 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The pipeline takes hours to run, but the walltime for each step is in > seconds. For example, this pipeline took 22 hours, but no stage too more than > 3 seconds: https://pasteboard.co/HVf8PCp.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6374: -- Assignee: Robert Burke (was: Tyler Akidau) > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6375) Walltime seems incorrect with the Go SDK and Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Akidau reassigned BEAM-6375: -- Assignee: Robert Burke (was: Tyler Akidau) > Walltime seems incorrect with the Go SDK and Dataflow > - > > Key: BEAM-6375 > URL: https://issues.apache.org/jira/browse/BEAM-6375 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The pipeline takes hours to run, but the walltime for each step is in > seconds. For example, this pipeline took 22 hours, but no stage too more than > 3 seconds: https://pasteboard.co/HVf8PCp.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3612) Make it easy to generate type-specialized Go SDK reflectx.Funcs
[ https://issues.apache.org/jira/browse/BEAM-3612?focusedWorklogId=181922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-181922 ] ASF GitHub Bot logged work on BEAM-3612: Author: ASF GitHub Bot Created on: 07/Jan/19 17:15 Start Date: 07/Jan/19 17:15 Worklog Time Spent: 10m Work Description: lostluck commented on issue #7361: [BEAM-3612] Generate type assertion shims for beam. URL: https://github.com/apache/beam/pull/7361#issuecomment-452009410 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 181922) Time Spent: 8h 50m (was: 8h 40m) > Make it easy to generate type-specialized Go SDK reflectx.Funcs > --- > > Key: BEAM-3612 > URL: https://issues.apache.org/jira/browse/BEAM-3612 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)