[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349598
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 06:32
Start Date: 26/Nov/19 06:32
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10208: [BEAM-8733] 
Handle the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558482209
 
 
   Hi @lukecwik, great thanks for tracing the `Run Java PreCommit`, I think 
there is a race condition in the constructor of `BeamFnControlClient`. I try to 
fix it and already updated the PR.  
   
   Besides, the suggestion makes sense to me. I have added a new commit which 
could address your comments.
   
   Best, Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349598)
Time Spent: 2h 50m  (was: 2h 40m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=349567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349567
 ]

ASF GitHub Bot logged work on BEAM-8747:


Author: ASF GitHub Bot
Created on: 26/Nov/19 05:07
Start Date: 26/Nov/19 05:07
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10172: 
[BEAM-8747] Guava dependency cleanup
URL: https://github.com/apache/beam/pull/10172
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349567)
Time Spent: 2h 40m  (was: 2.5h)

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349566
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 04:55
Start Date: 26/Nov/19 04:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558459654
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349566)
Time Spent: 2h 40m  (was: 2.5h)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349565
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 04:54
Start Date: 26/Nov/19 04:54
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558459615
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349565)
Time Spent: 2.5h  (was: 2h 20m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8542) Add async write to AWS SNS IO & remove retry logic

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8542?focusedWorklogId=349545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349545
 ]

ASF GitHub Bot logged work on BEAM-8542:


Author: ASF GitHub Bot
Created on: 26/Nov/19 03:37
Start Date: 26/Nov/19 03:37
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #10078: 
[BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
URL: https://github.com/apache/beam/pull/10078#discussion_r350530346
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java
 ##
 @@ -62,310 +58,272 @@
  * PCollection data = ...;
  *
  * data.apply(SnsIO.write()
- * .withPublishRequestFn(m -> 
PublishRequest.builder().topicArn("topicArn").message(m).build())
- * .withTopicArn("topicArn")
- * .withRetryConfiguration(
- *SnsIO.RetryConfiguration.create(
- *  4, org.joda.time.Duration.standardSeconds(10)))
- * .withSnsClientProvider(new 
BasicSnsClientProvider(awsCredentialsProvider, region));
+ * .withElementCoder(StringUtf8Coder.of())
+ * .withPublishRequestFn(createPublishRequestFn())
+ * .withSnsClientProvider(new 
BasicSnsClientProvider(awsCredentialsProvider, region));
+ *
  * }
  *
  * As a client, you need to provide at least the following things:
  *
  * 
- *   SNS topic arn you're going to publish to
- *   Retry Configuration
- *   AwsCredentialsProvider, which you can pass on to 
BasicSnsClientProvider
+ *   Coder for element T.
  *   publishRequestFn, a function to convert your message into 
PublishRequest
+ *   SnsClientProvider, a provider to create an async client.
  * 
  */
 @Experimental(Experimental.Kind.SOURCE_SINK)
-public final class SnsIO {
-
-  // Write data tp SNS
-  public static  Write write() {
-return new AutoValue_SnsIO_Write.Builder().build();
+final class SnsIO {
+  // Write data to SNS
+  static  Write write() {
+return new 
AutoValue_SnsIO_Write.Builder().setThrowOnError(false).build();
   }
 
-  /**
-   * A POJO encapsulating a configuration for retry behavior when issuing 
requests to SNS. A retry
-   * will be attempted until the maxAttempts or maxDuration is exceeded, 
whichever comes first, for
-   * any of the following exceptions:
-   *
-   * 
-   *   {@link IOException}
-   * 
-   */
-  @AutoValue
-  public abstract static class RetryConfiguration implements Serializable {
-@VisibleForTesting
-static final RetryPredicate DEFAULT_RETRY_PREDICATE = new 
DefaultRetryPredicate();
-
-abstract int getMaxAttempts();
-
-abstract Duration getMaxDuration();
-
-abstract RetryPredicate getRetryPredicate();
-
-abstract Builder builder();
-
-public static RetryConfiguration create(int maxAttempts, Duration 
maxDuration) {
-  checkArgument(maxAttempts > 0, "maxAttempts should be greater than 0");
-  checkArgument(
-  maxDuration != null && maxDuration.isLongerThan(Duration.ZERO),
-  "maxDuration should be greater than 0");
-  return new AutoValue_SnsIO_RetryConfiguration.Builder()
-  .setMaxAttempts(maxAttempts)
-  .setMaxDuration(maxDuration)
-  .setRetryPredicate(DEFAULT_RETRY_PREDICATE)
-  .build();
-}
-
-@AutoValue.Builder
-abstract static class Builder {
-  abstract Builder setMaxAttempts(int maxAttempts);
-
-  abstract Builder setMaxDuration(Duration maxDuration);
-
-  abstract Builder setRetryPredicate(RetryPredicate retryPredicate);
-
-  abstract RetryConfiguration build();
-}
-
-/**
- * An interface used to control if we retry the SNS Publish call when a 
{@link Throwable}
- * occurs. If {@link RetryPredicate#test(Object)} returns true, {@link 
Write} tries to resend
- * the requests to the Solr server if the {@link RetryConfiguration} 
permits it.
- */
-@FunctionalInterface
-interface RetryPredicate extends Predicate, Serializable {}
-
-private static class DefaultRetryPredicate implements RetryPredicate {
-  private static final ImmutableSet ELIGIBLE_CODES =
-  ImmutableSet.of(HttpStatus.SC_SERVICE_UNAVAILABLE);
-
-  @Override
-  public boolean test(Throwable throwable) {
-return (throwable instanceof IOException
-|| (throwable instanceof InternalErrorException)
-|| (throwable instanceof InternalErrorException
-&& ELIGIBLE_CODES.contains(((InternalErrorException) 
throwable).statusCode(;
-  }
-}
-  }
-
-  /** Implementation of {@link #write}. */
+  /** Implementation of {@link #write()}. */
   @AutoValue
   public abstract static class Write
-  extends PTransform, PCollection> {
-@Nullable
-abstract String getTopicArn();
+  extends PTransform, PCollection>> {
 
 @Nullable
-abstract 

[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=349542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349542
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 26/Nov/19 03:30
Start Date: 26/Nov/19 03:30
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #9899: 
[BEAM-8511] [WIP] KinesisIO.Read enhanced fanout
URL: https://github.com/apache/beam/pull/9899#discussion_r350483171
 
 

 ##
 File path: 
sdks/java/io/kinesis2/src/main/java/org/apache/beam/sdk/io/kinesis2/KinesisRecord.java
 ##
 @@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kinesis2;
+
+import static 
org.apache.commons.lang.builder.HashCodeBuilder.reflectionHashCode;
+
+import java.nio.ByteBuffer;
+import java.nio.charset.StandardCharsets;
+import org.apache.commons.lang.builder.EqualsBuilder;
+import org.apache.commons.lang.builder.ToStringBuilder;
+import org.joda.time.Instant;
+import software.amazon.kinesis.retrieval.KinesisClientRecord;
+import software.amazon.kinesis.retrieval.kpl.ExtendedSequenceNumber;
+
+/** {@link KinesisClientRecord} enhanced with utility methods. */
+public class KinesisRecord {
+
+  private Instant readTime;
+  private String streamName;
+  private String shardId;
+  private long subSequenceNumber;
+  private String sequenceNumber;
+  private Instant approximateArrivalTimestamp;
+  private ByteBuffer data;
+  private String partitionKey;
+
+  public KinesisRecord(KinesisClientRecord record, String streamName, String 
shardId) {
+this(
+record.data(),
+record.sequenceNumber(),
+record.subSequenceNumber(),
+record.partitionKey(),
+TimeUtil.toJoda(record.approximateArrivalTimestamp()),
+Instant.now(),
+streamName,
+shardId);
+  }
+
+  public KinesisRecord(
+  ByteBuffer data,
+  String sequenceNumber,
+  long subSequenceNumber,
+  String partitionKey,
+  Instant approximateArrivalTimestamp,
+  Instant readTime,
+  String streamName,
+  String shardId) {
+this.data = copyData(data);
+this.sequenceNumber = sequenceNumber;
+this.subSequenceNumber = subSequenceNumber;
+this.partitionKey = partitionKey;
+this.approximateArrivalTimestamp = approximateArrivalTimestamp;
+this.readTime = readTime;
+this.streamName = streamName;
+this.shardId = shardId;
+  }
+
+  private ByteBuffer copyData(ByteBuffer data) {
+data.rewind();
+byte[] bytes = new byte[data.remaining()];
+data.get(bytes);
+return ByteBuffer.wrap(bytes);
+  }
+
+  public ExtendedSequenceNumber getExtendedSequenceNumber() {
+return new ExtendedSequenceNumber(getSequenceNumber(), 
getSubSequenceNumber());
+  }
+
+  /** @return The unique identifier of the record based on its position in the 
stream. */
+  public byte[] getUniqueId() {
+return 
getExtendedSequenceNumber().toString().getBytes(StandardCharsets.UTF_8);
+  }
+
+  public Instant getReadTime() {
+return readTime;
+  }
+
+  public String getStreamName() {
+return streamName;
+  }
+
+  public String getShardId() {
+return shardId;
+  }
+
+  public byte[] getDataAsBytes() {
+return getData().array();
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+return EqualsBuilder.reflectionEquals(this, obj);
+  }
+
+  @Override
+  public int hashCode() {
+return reflectionHashCode(this);
+  }
+
+  @Override
+  public String toString() {
+return ToStringBuilder.reflectionToString(this);
 
 Review comment:
   This is also new. any reason, @jfarr ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349542)
Time Spent: 2h 50m  (was: 2h 40m)

> Support for enhanced 

[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=349541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349541
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 26/Nov/19 03:23
Start Date: 26/Nov/19 03:23
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-558442093
 
 
   Run Python PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349541)
Time Spent: 3h 20m  (was: 3h 10m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349515
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 02:26
Start Date: 26/Nov/19 02:26
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558415808
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349515)
Time Spent: 3h 10m  (was: 3h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349514
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 02:26
Start Date: 26/Nov/19 02:26
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558429997
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349514)
Time Spent: 3h  (was: 2h 50m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8824) Add support for allowed lateness in python sdk

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8824?focusedWorklogId=349512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349512
 ]

ASF GitHub Bot logged work on BEAM-8824:


Author: ASF GitHub Bot
Created on: 26/Nov/19 02:23
Start Date: 26/Nov/19 02:23
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #10216: [BEAM-8824] Add 
support to specify window allowed_lateness in python sdk
URL: https://github.com/apache/beam/pull/10216
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8824) Add support for allowed lateness in python sdk

2019-11-25 Thread Yichi Zhang (Jira)
Yichi Zhang created BEAM-8824:
-

 Summary: Add support for allowed lateness in python sdk
 Key: BEAM-8824
 URL: https://issues.apache.org/jira/browse/BEAM-8824
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Yichi Zhang
Assignee: Yichi Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-25 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982047#comment-16982047
 ] 

Kyle Weaver edited comment on BEAM-8512 at 11/26/19 1:56 AM:
-

I'm trying to get the MiniCluster to work with FlinkUberJarJobServer, but I'm 
hitting a 404 error when it tries to upload the job jar. When I access 
v1/jobs/overview in my browser, I get 200 with response {"jobs":[]} However, 
when I go to v1/jars (which is supposed to just list the jars) I get 404 with 
response {"errors":["Not found."]}. For reference, when I use a "real" Flink 
cluster, the response is {"address":"http://localhost:8081","files":[]}. Any 
idea why this might be happening? [~mxm]


was (Author: ibzib):
I'm trying to get the MiniCluster to work with FlinkUberJarJobServer, but I'm 
hitting a 404 error when it tries to upload the job jar. When I access 
v1/jobs/overview in my browser, I get 200 with response {"jobs":[]} However, 
when I go to v1/jars (which is supposed to just list the jars) I get 404 with 
response {"errors":["Not found."]}. Any idea why this might be happening?

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-25 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982047#comment-16982047
 ] 

Kyle Weaver commented on BEAM-8512:
---

I'm trying to get the MiniCluster to work with FlinkUberJarJobServer, but I'm 
hitting a 404 error when it tries to upload the job jar. When I access 
v1/jobs/overview in my browser, I get 200 with response {"jobs":[]} However, 
when I go to v1/jars (which is supposed to just list the jars) I get 404 with 
response {"errors":["Not found."]}. Any idea why this might be happening?

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349495
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:46
Start Date: 26/Nov/19 01:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558420709
 
 
   Run JavaPortabilityApi PreCommit
   Run Java_Examples_Dataflow PreCommit
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349495)
Time Spent: 1.5h  (was: 1h 20m)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349494
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:45
Start Date: 26/Nov/19 01:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558420866
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349494)
Time Spent: 1h 20m  (was: 1h 10m)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349493
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:45
Start Date: 26/Nov/19 01:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558420826
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349493)
Time Spent: 1h 10m  (was: 1h)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349491
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:45
Start Date: 26/Nov/19 01:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558420709
 
 
   Run JavaPortabilityApi PreCommit
   Run Java_Examples_Dataflow PreCommit
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349491)
Time Spent: 50m  (was: 40m)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349492
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:45
Start Date: 26/Nov/19 01:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558420795
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349492)
Time Spent: 1h  (was: 50m)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349490
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:44
Start Date: 26/Nov/19 01:44
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558293922
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349490)
Time Spent: 40m  (was: 0.5h)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=349488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349488
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:27
Start Date: 26/Nov/19 01:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-558416781
 
 
   @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349488)
Time Spent: 6.5h  (was: 6h 20m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349485
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:23
Start Date: 26/Nov/19 01:23
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558401002
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349485)
Time Spent: 2h 50m  (was: 2h 40m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349484
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:23
Start Date: 26/Nov/19 01:23
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558415808
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349484)
Time Spent: 2h 40m  (was: 2.5h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8823) Make FnApiRunner work by executing ready elements instead of stages

2019-11-25 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-8823:
---

 Summary: Make FnApiRunner work by executing ready elements instead 
of stages
 Key: BEAM-8823
 URL: https://issues.apache.org/jira/browse/BEAM-8823
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=349482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349482
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 26/Nov/19 01:06
Start Date: 26/Nov/19 01:06
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-558412031
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349482)
Time Spent: 3h 10m  (was: 3h)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349476
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:47
Start Date: 26/Nov/19 00:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558407669
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349476)
Time Spent: 2h 20m  (was: 2h 10m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349472
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:36
Start Date: 26/Nov/19 00:36
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558405511
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349472)
Time Spent: 2h 10m  (was: 2h)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8652) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it test flaky

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982002#comment-16982002
 ] 

Valentyn Tymofieiev commented on BEAM-8652:
---

Now that BEAM-8651 is fixed, let's close this issue and reopen if this flake 
comes up again.

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it test flaky
> -
>
> Key: BEAM-8652
> URL: https://issues.apache.org/jira/browse/BEAM-8652
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Attachments: BJL2pfOB2Qa.png
>
>
> Looks like a streaming metrics issues. Logs: 
> [https://builds.apache.org/job/beam_PostCommit_Python36/985/testReport/junit/apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline_test/ExerciseStreamingMetricsPipelineTest/test_streaming_pipeline_returns_expected_user_metrics_fnapi_it/]
>  Error Message:
>  "Unable to match metrics for matcher name: 'ElementCount' (label_key: 
> 'output_user_name' label_value: 'ReadFromPubSub/Read-out0'). (label_key: 
> 'original_name' label_value: 'ReadFromPubSub/Read-out0-ElementCount'). 
> attempted: <3> committed: <3>\nActual 
> MetricResults:\nMetricResult(key=MetricKey(step=generate_metrics, 
> metric=MetricName(namespace=apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline.StreamingUserMetricsDoFn,
>  name=double_msg_counter_name), labels={}), committed=6, 
> attempted=6)\nMetricResult(key=MetricKey(step=generate_metrics,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=349458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349458
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:22
Start Date: 26/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb 
worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150#issuecomment-558401828
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349458)
Time Spent: 1h 10m  (was: 1h)

> Add worker_region and worker_zone options
> -
>
> Key: BEAM-8251
> URL: https://issues.apache.org/jira/browse/BEAM-8251
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We are refining the way the user specifies worker regions and zones to the 
> Dataflow service. We need to add worker_region and worker_zone pipeline 
> options that will be preferred over the old experiments=worker_region and 
> --zone flags. I will create subtasks for adding these options to each SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8652) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it test flaky

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-8652.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it test flaky
> -
>
> Key: BEAM-8652
> URL: https://issues.apache.org/jira/browse/BEAM-8652
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
> Attachments: BJL2pfOB2Qa.png
>
>
> Looks like a streaming metrics issues. Logs: 
> [https://builds.apache.org/job/beam_PostCommit_Python36/985/testReport/junit/apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline_test/ExerciseStreamingMetricsPipelineTest/test_streaming_pipeline_returns_expected_user_metrics_fnapi_it/]
>  Error Message:
>  "Unable to match metrics for matcher name: 'ElementCount' (label_key: 
> 'output_user_name' label_value: 'ReadFromPubSub/Read-out0'). (label_key: 
> 'original_name' label_value: 'ReadFromPubSub/Read-out0-ElementCount'). 
> attempted: <3> committed: <3>\nActual 
> MetricResults:\nMetricResult(key=MetricKey(step=generate_metrics, 
> metric=MetricName(namespace=apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline.StreamingUserMetricsDoFn,
>  name=double_msg_counter_name), labels={}), committed=6, 
> attempted=6)\nMetricResult(key=MetricKey(step=generate_metrics,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349457
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:19
Start Date: 26/Nov/19 00:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558401225
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349457)
Time Spent: 5h 50m  (was: 5h 40m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349455
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:19
Start Date: 26/Nov/19 00:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558401205
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349455)
Time Spent: 5h 40m  (was: 5.5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349454
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:18
Start Date: 26/Nov/19 00:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558387277
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349454)
Time Spent: 2.5h  (was: 2h 20m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349453
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 26/Nov/19 00:18
Start Date: 26/Nov/19 00:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558401002
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349453)
Time Spent: 2h 20m  (was: 2h 10m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349435
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:43
Start Date: 25/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558392002
 
 
   run portable_python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349435)
Time Spent: 2h 10m  (was: 2h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349434
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:43
Start Date: 25/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558392002
 
 
   run portable_python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349434)
Time Spent: 2h  (was: 1h 50m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=349432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349432
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:27
Start Date: 25/Nov/19 23:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb 
worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150#issuecomment-558387750
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349432)
Time Spent: 1h  (was: 50m)

> Add worker_region and worker_zone options
> -
>
> Key: BEAM-8251
> URL: https://issues.apache.org/jira/browse/BEAM-8251
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We are refining the way the user specifies worker regions and zones to the 
> Dataflow service. We need to add worker_region and worker_zone pipeline 
> options that will be preferred over the old experiments=worker_region and 
> --zone flags. I will create subtasks for adding these options to each SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349431
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:25
Start Date: 25/Nov/19 23:25
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558337637
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349431)
Time Spent: 1h 50m  (was: 1h 40m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349430
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:25
Start Date: 25/Nov/19 23:25
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558387277
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349430)
Time Spent: 1h 40m  (was: 1.5h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=349428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349428
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:24
Start Date: 25/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-558387028
 
 
   closing since this bug is fixed now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349428)
Time Spent: 1h 50m  (was: 1h 40m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=349429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349429
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:24
Start Date: 25/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #9972: [BEAM-8545] don't 
docker pull before docker run
URL: https://github.com/apache/beam/pull/9972
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349429)
Time Spent: 2h  (was: 1h 50m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=349427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349427
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:24
Start Date: 25/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-558386949
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349427)
Time Spent: 4h 40m  (was: 4.5h)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349424
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:15
Start Date: 25/Nov/19 23:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558384241
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349424)
Time Spent: 2h  (was: 1h 50m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349423
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 25/Nov/19 23:15
Start Date: 25/Nov/19 23:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558384184
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349423)
Time Spent: 1h 50m  (was: 1h 40m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=349405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349405
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 25/Nov/19 22:39
Start Date: 25/Nov/19 22:39
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-558373015
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349405)
Time Spent: 3h  (was: 2h 50m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349403
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 22:37
Start Date: 25/Nov/19 22:37
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10213: [BEAM-8815] 
Skip manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10213
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349395
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 22:26
Start Date: 25/Nov/19 22:26
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10205: [BEAM-8815] 
Skip manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10205
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349395)
Time Spent: 1h 40m  (was: 1.5h)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-25 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981950#comment-16981950
 ] 

Tomo Suzuki edited comment on BEAM-8822 at 11/25/19 10:26 PM:
--

org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest

{noformat}
org/apache/commons/httpclient/URIException
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException
at org.elasticsearch.hadoop.util.Version.(Version.java:58)
at 
org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:214)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:405)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:386)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.computeSplitsIfNecessary(HadoopFormatIO.java:678)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.getEstimatedSizeBytes(HadoopFormatIO.java:661)
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:89)
at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:76)
at 
org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:155)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:208)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest.testHifIOWithElastic(HadoopFormatIOElasticTest.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349394
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 22:25
Start Date: 25/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #10205: [BEAM-8815] Skip 
manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10205#issuecomment-558368475
 
 
   Merging, unrelated CI issue is tracked in  
https://issues.apache.org/jira/browse/BEAM-8793
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349394)
Time Spent: 1.5h  (was: 1h 20m)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-25 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981950#comment-16981950
 ] 

Tomo Suzuki commented on BEAM-8822:
---

org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest

{noformat}
org/apache/commons/httpclient/URIException
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException
at org.elasticsearch.hadoop.util.Version.(Version.java:58)
at 
org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:214)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:405)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:386)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.computeSplitsIfNecessary(HadoopFormatIO.java:678)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.getEstimatedSizeBytes(HadoopFormatIO.java:661)
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:89)
at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:76)
at 
org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:155)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:208)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest.testHifIOWithElastic(HadoopFormatIOElasticTest.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Updated] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-25 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8822:
--
Description: 
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

The URL says

{quote}Following branches are EOL: 

[2.0.x - 2.7.x]{quote}


https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532

About compatibility with other library:

Hadoop client 2.7 is not compatible with Guava > 21 because of 
Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
method 
([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).

  was:
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532


Hadoop client 2.7 is not compatible with Guava > 21 because of 
Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
method 
([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).


> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-25 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8822:
--
Description: 
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532


Hadoop client 2.7 is not compatible with Guava > 21 because of 
Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
method 
([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).

  was:
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532


> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-25 Thread Tomo Suzuki (Jira)
Tomo Suzuki created BEAM-8822:
-

 Summary: Hadoop Client version 2.8 from 2.7
 Key: BEAM-8822
 URL: https://issues.apache.org/jira/browse/BEAM-8822
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Tomo Suzuki
Assignee: Tomo Suzuki


[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8569) Support Hadoop 3 on Beam

2019-11-25 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981940#comment-16981940
 ] 

Tomo Suzuki commented on BEAM-8569:
---

Yes, I'll create a PR.

> Support Hadoop 3 on Beam
> 
>
> Key: BEAM-8569
> URL: https://issues.apache.org/jira/browse/BEAM-8569
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop-file-system, io-java-hadoop-format, 
> runner-spark
>Reporter: Ismaël Mejía
>Priority: Minor
>
> It seems that Hadoop 3 in production is finally happening. CDH supports it in 
> their latest version and Spark 3 will include support for Hadoop 3 too.
> This is an uber ticket to cover the required changes to the codebase to 
> ensure compliance with Hadoop 3.x
> Hadoop dependencies in Beam are mostly provided and APIs are until some point 
> compatible, but we may require changes in the CI to test that new changes 
> work both in Hadoop 2 and Hadoop 3 until we decide to remove support for 
> Hadoop 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8569) Support Hadoop 3 on Beam

2019-11-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981913#comment-16981913
 ] 

Ismaël Mejía commented on BEAM-8569:


[~suztomo] probably a quicker way forward is to unblock the bigtable issue is 
to move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
have a good reason to do so 
[https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches]

Can you create a PR for that?

> Support Hadoop 3 on Beam
> 
>
> Key: BEAM-8569
> URL: https://issues.apache.org/jira/browse/BEAM-8569
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop-file-system, io-java-hadoop-format, 
> runner-spark
>Reporter: Ismaël Mejía
>Priority: Minor
>
> It seems that Hadoop 3 in production is finally happening. CDH supports it in 
> their latest version and Spark 3 will include support for Hadoop 3 too.
> This is an uber ticket to cover the required changes to the codebase to 
> ensure compliance with Hadoop 3.x
> Hadoop dependencies in Beam are mostly provided and APIs are until some point 
> compatible, but we may require changes in the CI to test that new changes 
> work both in Hadoop 2 and Hadoop 3 until we decide to remove support for 
> Hadoop 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=349383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349383
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:50
Start Date: 25/Nov/19 21:50
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#issuecomment-558355825
 
 
   I re-implemented the test in two different ways:
   1. Use TimestampedValue only.
  test_hot_key_combining_with_accumulation_mode
   2. Use TestStream.
  test_hot_key_combining_with_accumulation_mode2
   
   Both tests succeeded no matter the accumulating mode is ACCUMULATING or 
DISCARDING.
   Is that a bug?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349383)
Time Spent: 20h 40m  (was: 20.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=349382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349382
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:48
Start Date: 25/Nov/19 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10107: [BEAM-8335] 
Change has_unbounded_sources to predetermined list of sources
URL: https://github.com/apache/beam/pull/10107#discussion_r350440308
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -24,13 +24,27 @@
 from __future__ import absolute_import
 
 import apache_beam as beam
+from apache_beam.io.external.gcp.pubsub import ReadFromPubSub as 
ExternalReadFromPubSub
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.gcp.bigquery_tools import BigQueryReader
+from apache_beam.io.gcp.pubsub import ReadFromPubSub
 from apache_beam.pipeline import PipelineVisitor
 from apache_beam.runners.interactive import cache_manager as cache
 from apache_beam.runners.interactive import interactive_environment as ie
 
 READ_CACHE = "_ReadCache_"
 WRITE_CACHE = "_WriteCache_"
 
+# Use a tuple to define the list of unbounded sources. It is not always 
feasible
+# to correctly find all the unbounded sources in the SDF world. This is
+# because SDF allows the source to dynamically create sources at runtime.
+REPLACEABLE_UNBOUNDED_SOURCES = (
+ExternalReadFromPubSub,
+ReadFromKafka,
+ReadFromPubSub,
+BigQueryReader,
 
 Review comment:
   Hm .. after thinking about it again - is this intended to be for local 
execution? Or should it work with other runners? (e.g. Dataflow)
   
   If it's only for local execution, then External transforms won't work either.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349382)
Time Spent: 34.5h  (was: 34h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 34.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=349368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349368
 ]

ASF GitHub Bot logged work on BEAM-7951:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:25
Start Date: 25/Nov/19 21:25
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9979: [BEAM-7951] 
Allow runner to configure customization WindowedValue coder.
URL: https://github.com/apache/beam/pull/9979#discussion_r350426810
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1338,3 +1359,44 @@ message ExecutableStagePayload {
 string local_name = 2;
   }
 }
+
+// Window information in WindowedValue
+message WindowInfo {
+  // timestamp in milliseconds
+  int64 timestamp = 1;
+  repeated BoundedWindow bounded_windows = 2;
+  PaneInfo pane_info = 3;
+}
+
+// Represents window information assigned to data elements
+message BoundedWindow {
 
 Review comment:
   This limits the set of window types and the number of windows that can be 
specified. I think we should go with a model where we encode a "dummy element" 
which is replaced by the actual element when decoding. For example, the payload 
would be:
   ```
   message ParameterizedWindowedValuePayload {
 // A specification of a windowed value coder where the element coder is 
always "beam:coder:bytes:v1".
 string coder_id;
 // Contains an encoded windowed value with an empty byte[] element. This 
elements windowing
 // information should be copied to all elements that are decoded with this 
coder.
 bytes value;
   }
   ```
   
   This allows us to use the full range of window encodings and the already 
defined pane/timestamp encodings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349368)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349364
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:21
Start Date: 25/Nov/19 21:21
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10205: [BEAM-8815] 
Skip manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10205#discussion_r350428294
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/AbstractArtifactStagingService.java
 ##
 @@ -50,6 +50,8 @@
 public abstract class AbstractArtifactStagingService extends 
ArtifactStagingServiceImplBase
 implements FnService {
 
+  public static final String NO_ARTIFACTS_STAGED_TOKEN = 
"__no_artifacts_staged__";
 
 Review comment:
   For boot.go it's not required to check the token as the service will return 
the empty manifest. But for the py implementation it would be needed. I'm going 
to look at moving this to proto as follow-up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349364)
Time Spent: 1h 20m  (was: 1h 10m)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349353
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:00
Start Date: 25/Nov/19 21:00
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-557366844
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349353)
Time Spent: 1.5h  (was: 1h 20m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-1251) Python 3 Support

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-1251:
--
Description: 
FAQ
 Does Apache Beam support Python 3?
 - *Yes!*

Is there any remaining work?
 - We continue to improve user experience of Python 3 users, add support for 
new Python minor versions, and phase out support of old ones.
 Check out the Python SDK roadmap on [how to contribute or report a Python 3 
issue|https://beam.apache.org/roadmap/python-sdk/#python-3-support]!

Which SDK version should I use?
 - For best experience, use the latest released SDK. For summary of Py3-related 
changes, read this thread.

Help! I am getting a pickling error in StockUnpickler.find_class() on Python 3.
 - Does the error happens in load_session call? See BEAM-6158 . Do you use Beam 
SDK less than 2.17.0? See BEAM-8651.

  was:
I have been trying to use google datalab with python3. As I see there are 
several packages that does not support python3 yet which google datalab depends 
on. This is one of them.

https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6


> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 30h 10m
>  Remaining Estimate: 0h
>
> FAQ
>  Does Apache Beam support Python 3?
>  - *Yes!*
> Is there any remaining work?
>  - We continue to improve user experience of Python 3 users, add support for 
> new Python minor versions, and phase out support of old ones.
>  Check out the Python SDK roadmap on [how to contribute or report a Python 3 
> issue|https://beam.apache.org/roadmap/python-sdk/#python-3-support]!
> Which SDK version should I use?
>  - For best experience, use the latest released SDK. For summary of 
> Py3-related changes, read this thread.
> Help! I am getting a pickling error in StockUnpickler.find_class() on Python 
> 3.
>  - Does the error happens in load_session call? See BEAM-6158 . Do you use 
> Beam SDK less than 2.17.0? See BEAM-8651.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=349352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349352
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:00
Start Date: 25/Nov/19 21:00
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-558337637
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349352)
Time Spent: 1h 20m  (was: 1h 10m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349351
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:58
Start Date: 25/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10205: [BEAM-8815] Skip 
manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10205#issuecomment-558337063
 
 
   (tracking those Python flakes here: 
https://issues.apache.org/jira/browse/BEAM-8793)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349351)
Time Spent: 1h 10m  (was: 1h)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349362
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 25/Nov/19 21:12
Start Date: 25/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10035: 
[BEAM-8581] and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r350424819
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1036,14 +1074,17 @@ class BatchGlobalTriggerDriver(TriggerDriver):
   index=0,
   nonspeculative_index=0)
 
-  def process_elements(self, state, windowed_values, unused_output_watermark):
+  def process_elements(self, state, windowed_values,
+   unused_output_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Ah gotcha. I removed the default value on the output_watermarks, but the 
input_watermark needs a default value to keep backwards compatibility.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349362)
Time Spent: 5.5h  (was: 5h 20m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=349347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349347
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:58
Start Date: 25/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10130: [BEAM-8575] Test 
DoFn context params
URL: https://github.com/apache/beam/pull/10130#issuecomment-558336927
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349347)
Time Spent: 20.5h  (was: 20h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349350
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:58
Start Date: 25/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558336961
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349350)
Time Spent: 1h 40m  (was: 1.5h)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349349
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:58
Start Date: 25/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558337011
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349349)
Time Spent: 1.5h  (was: 1h 20m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=349348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349348
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:58
Start Date: 25/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10107: [BEAM-8335] 
Change has_unbounded_sources to predetermined list of sources
URL: https://github.com/apache/beam/pull/10107#issuecomment-558336973
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349348)
Time Spent: 34h 20m  (was: 34h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 34h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=349345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349345
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:55
Start Date: 25/Nov/19 20:55
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #10205: [BEAM-8815] Skip 
manifest when no artifacts are staged
URL: https://github.com/apache/beam/pull/10205#issuecomment-558336121
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349345)
Time Spent: 1h  (was: 50m)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349342
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:50
Start Date: 25/Nov/19 20:50
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558334258
 
 
   Ah gotcha, misread the comment. I just fixed up the commits as asked.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349342)
Time Spent: 5h 20m  (was: 5h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-1251) Python 3 Support

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981882#comment-16981882
 ] 

Valentyn Tymofieiev commented on BEAM-1251:
---

There is a lot of linked issues to BEAM-1251, which may be overwhelming for 
most users. I will reword the initial message here to capture the most 
essential information.

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 30h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8821) Update Python dependencies page for 2.17.0

2019-11-25 Thread Cyrus Maden (Jira)
Cyrus Maden created BEAM-8821:
-

 Summary: Update Python dependencies page for 2.17.0
 Key: BEAM-8821
 URL: https://issues.apache.org/jira/browse/BEAM-8821
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Cyrus Maden






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Description: 
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
fails during pickler.load_session(session_file): 
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on 

Note that this is different from BEAM-8651, since the error happens in a Batch 
Pipeline on a Dataflow runner and the error happens consistently.  

When testing it in the local/direct runner there seems to be no issue. 
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 773, in run self._load_main_session(self.local_staging_directory) 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 489, in _load_main_session pickler.load_session(session_file) 
  File 
"/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 
287, in load_session return dill.load_session(file_path) 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
load_session module = unpickler.load() 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
find_class return StockUnpickler.find_class(self, module, name) AttributeError: 
Can't get attribute 'FlattenCustomActions' on 
{code}
 

 

 

  was:
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
fails during pickler.load_session(session_file): 
{noformat}
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on 
{noformat}

Note that this is different from BEAM-8651, since the error happens in a Batch 
Pipeline on a Dataflow runner and the error happens consistently.  

When testing it in the local/direct runner there seems to be no issue. 
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| 

[jira] [Resolved] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-11-25 Thread Cyrus Maden (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Maden resolved BEAM-8309.
---
Fix Version/s: 2.16.0
   Resolution: Fixed

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Description: 
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
fails during pickler.load_session(session_file): 
{noformat}
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on 
{noformat}

Note that this is different from BEAM-8651, since the error happens in a Batch 
Pipeline on a Dataflow runner and the error happens consistently.  

When testing it in the local/direct runner there seems to be no issue. 
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 773, in run self._load_main_session(self.local_staging_directory) 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 489, in _load_main_session pickler.load_session(session_file) 
  File 
"/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 
287, in load_session return dill.load_session(file_path) 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
load_session module = unpickler.load() 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
find_class return StockUnpickler.find_class(self, module, name) AttributeError: 
Can't get attribute 'FlattenCustomActions' on 
{code}
 

 

 

  was:
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
consistently fails during pickler.load_session(session_file): 
{code:java}
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on 
{code}
.

When testing it in the local/direct runner there seems to be no issue.
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 

[jira] [Updated] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Description: 
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
consistently fails during pickler.load_session(session_file): 
{code:java}
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on 
{code}
.

When testing it in the local/direct runner there seems to be no issue.
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 773, in run self._load_main_session(self.local_staging_directory) 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 489, in _load_main_session pickler.load_session(session_file) 
  File 
"/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 
287, in load_session return dill.load_session(file_path) 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
load_session module = unpickler.load() 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
find_class return StockUnpickler.find_class(self, module, name) AttributeError: 
Can't get attribute 'FlattenCustomActions' on 
{code}
 

 

 

  was:
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
consistently fails during pickler.load_session(session_file): 
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on .

When testing it in the local/direct runner there seems to be no issue.
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): 
  File 

[jira] [Updated] (BEAM-8793) installGcpTest task flakes (python setup.py egg_info fails)

2019-11-25 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8793:
--
Summary: installGcpTest task flakes (python setup.py egg_info fails)  (was: 
installGcpTest task flakes)

> installGcpTest task flakes (python setup.py egg_info fails)
> ---
>
> Key: BEAM-8793
> URL: https://issues.apache.org/jira/browse/BEAM-8793
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> This seems to affect the installGcpTest task for all python versions. There 
> are a few different failure modes, but they look like they might be related.
> == 1 ==
> 11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED
> 11:01:38 Obtaining 
> file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python
> 11:01:38 ERROR: Command errored out with exit status 1:
> 11:01:38  command: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
>  -c 'import sys, setuptools, tokenize; sys.argv[0] = 
> '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
>  
> __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
>  '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' 
> egg_info
> 11:01:38  cwd: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/
> 11:01:38 Complete output (37 lines):
> 11:01:38 Traceback (most recent call last):
> 11:01:38   File "", line 1, in 
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
>  line 264, in 
> 11:01:38 'test': generate_protos_first(test),
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
>  line 144, in setup
> 11:01:38 _install_setup_requires(attrs)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
>  line 139, in _install_setup_requires
> 11:01:38 dist.fetch_build_eggs(dist.setup_requires)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
>  line 720, in fetch_build_eggs
> 11:01:38 replace_conflicting=True,
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 782, in resolve
> 11:01:38 replace_conflicting=replace_conflicting
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 1065, in best_match
> 11:01:38 return self.obtain(req, installer)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 1077, in obtain
> 11:01:38 return installer(requirement)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
>  line 787, in fetch_build_egg
> 11:01:38 return cmd.easy_install(req)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 679, in easy_install
> 11:01:38 return self.install_item(spec, dist.location, tmpdir, deps)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 705, in install_item
> 11:01:38 dists = self.install_eggs(spec, download, tmpdir)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 855, in install_eggs
> 11:01:38 return [self.install_wheel(dist_filename, tmpdir)]
> 11:01:38   File 
> 

[jira] [Updated] (BEAM-8793) installGcpTest task flakes

2019-11-25 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8793:
--
Description: 
This seems to affect the installGcpTest task for all python versions. There are 
a few different failure modes, but they look like they might be related.

== 1 ==
11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED
11:01:38 Obtaining 
file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python
11:01:38 ERROR: Command errored out with exit status 1:
11:01:38  command: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
 -c 'import sys, setuptools, tokenize; sys.argv[0] = 
'"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
 
__file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
 '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
11:01:38  cwd: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/
11:01:38 Complete output (37 lines):
11:01:38 Traceback (most recent call last):
11:01:38   File "", line 1, in 
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
 line 264, in 
11:01:38 'test': generate_protos_first(test),
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 144, in setup
11:01:38 _install_setup_requires(attrs)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 139, in _install_setup_requires
11:01:38 dist.fetch_build_eggs(dist.setup_requires)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 720, in fetch_build_eggs
11:01:38 replace_conflicting=True,
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 782, in resolve
11:01:38 replace_conflicting=replace_conflicting
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1065, in best_match
11:01:38 return self.obtain(req, installer)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1077, in obtain
11:01:38 return installer(requirement)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 787, in fetch_build_egg
11:01:38 return cmd.easy_install(req)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 679, in easy_install
11:01:38 return self.install_item(spec, dist.location, tmpdir, deps)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 705, in install_item
11:01:38 dists = self.install_eggs(spec, download, tmpdir)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 855, in install_eggs
11:01:38 return [self.install_wheel(dist_filename, tmpdir)]
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 1073, in install_wheel
11:01:38 os.path.dirname(destination)
11:01:38   File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute
11:01:38 util.execute(func, args, msg, dry_run=self.dry_run)
11:01:38   File "/usr/lib/python3.5/distutils/util.py", line 301, in execute
11:01:38 func(*args)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py",
 line 101, in install_as_egg
11:01:38 self._install_as_egg(destination_eggdir, zf)
11:01:38   File 

[jira] [Comment Edited] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981875#comment-16981875
 ] 

Valentyn Tymofieiev edited comment on BEAM-8441 at 11/25/19 8:37 PM:
-

I have renamed and reworded the issue so that we keep it open, in case it makes 
it easier for other users to discover 
[BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945].


was (Author: tvalentyn):
I have renamed and reworded the issue so that we keep it open, in case it makes 
it easier for other users to discover 
[BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945]

> Python 3 pipeline fails with errors in StockUnpickler.find_class() during 
> loading a main session. 
> --
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
> consistently fails during pickler.load_session(session_file): 
> StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
> attribute 'SomeAttribute' on  '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>.
> When testing it in the local/direct runner there seems to be no issue.
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) 
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981875#comment-16981875
 ] 

Valentyn Tymofieiev commented on BEAM-8441:
---

I have renamed and reworded the issue so that we keep it open, in case it makes 
it easier for other users to discover 
[BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945]

> Python 3 pipeline fails with errors in StockUnpickler.find_class() during 
> loading a main session. 
> --
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
> consistently fails during pickler.load_session(session_file): 
> StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
> attribute 'SomeAttribute' on  '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>.
> When testing it in the local/direct runner there seems to be no issue.
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) 
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8441) Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Summary: Python 3 pipeline fails with errors in StockUnpickler.find_class() 
during loading a main session.   (was: Side-Input in Python3 fails to pickle 
class)

> Python 3 pipeline fails with errors in StockUnpickler.find_class() during 
> loading a main session. 
> --
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
> consistently fails during pickler.load_session(session_file): 
> StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
> attribute 'SomeAttribute' on  '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>.
> When testing it in the local/direct runner there seems to be no issue.
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) 
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) 
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() 
>   File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8441) Side-Input in Python3 fails to pickle class

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Description: 
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline 
consistently fails during pickler.load_session(session_file): 
StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'SomeAttribute' on .

When testing it in the local/direct runner there seems to be no issue.
 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 773, in run self._load_main_session(self.local_staging_directory) 
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", 
line 489, in _load_main_session pickler.load_session(session_file) 
  File 
"/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 
287, in load_session return dill.load_session(file_path) 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
load_session module = unpickler.load() 
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
find_class return StockUnpickler.find_class(self, module, name) AttributeError: 
Can't get attribute 'FlattenCustomActions' on 
{code}
 

 

 

  was:
When running Apache Beam with Python3 on Google Cloud Dataflow Sideinputs don't 
work.

When testing it in the local/direct runner there seems to be no issue.

 

 
{code:java}
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day ActionsOnly retains actions with 
custom_conversions
Flattens the actions
Adds custom conversions names using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions, self).__init__()
self.conversions = conversionsdef expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, 
self.conversions)
)

# ...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, 
coder=JsonCoder())
| TransformConversionMetadata()
)(
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
)(
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type, 
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
){code}
 

I receive the following Traceback in Dataflow:
{code:java}
Traceback (most recent call last): File 
"/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
773, in run self._load_main_session(self.local_staging_directory) File 

[jira] [Assigned] (BEAM-8441) Side-Input in Python3 fails to pickle class

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-8441:
-

Assignee: Valentyn Tymofieiev

> Side-Input in Python3 fails to pickle class
> ---
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow Sideinputs 
> don't work.
> When testing it in the local/direct runner there seems to be no issue.
>  
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8441) Side-Input in Python3 fails to pickle class

2019-11-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8441:
--
Status: Open  (was: Triage Needed)

> Side-Input in Python3 fails to pickle class
> ---
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow Sideinputs 
> don't work.
> When testing it in the local/direct runner there seems to be no issue.
>  
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8441) Side-Input in Python3 fails to pickle class

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981828#comment-16981828
 ] 

Valentyn Tymofieiev edited comment on BEAM-8441 at 11/25/19 8:21 PM:
-

Since there is the pipeline main module includes a "super()" invocation, and 
there is "dill.load_session" in the stack trace, this error is caused by 
BEAM-6158. Can you please take a look at [1] and see if this can help address 
your issue?

[1] 
https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945


was (Author: tvalentyn):
Since I see "dill.load_session" in stack trace, I suspect you hit BEAM-6158. 
Can you please take a look at [1] and see if this can help address your issue?

[1] 
https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945

> Side-Input in Python3 fails to pickle class
> ---
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow Sideinputs 
> don't work.
> When testing it in the local/direct runner there seems to be no issue.
>  
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919945#comment-16919945
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 11/25/19 8:17 PM:
-

The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the 
pipeline code does not depend on the entities defined in the main module, and 
don't pass --save_main_session.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is not an equivalent replacement, but 
will suffice in simple class hierarchies as a temporary workaround. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].
 - don't use super() in the main module.



was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the 
pipeline code does not depend on the entities defined in the main module, and 
don't pass --save_main_session.
 - don't use super() in the main module.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].

> Using --save_main_session fails on Python 3 when main module has invocations 
> of superclass method using 'super' .
> -
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> 

[jira] [Updated] (BEAM-8812) [release 2.17.0 validation] org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerOrderingWithCreate failure

2019-11-25 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8812:
--
Status: Open  (was: Triage Needed)

> [release 2.17.0 validation] 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testEventTimeTimerOrderingWithCreate
>  failure
> 
>
> Key: BEAM-8812
> URL: https://issues.apache.org/jira/browse/BEAM-8812
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Job: 
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_PR/61/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$TimerTests/testEventTimeTimerOrderingWithCreate/]
> Error from log:
> INFO: Dataflow job 2019-11-21_13_37_48-7347702521640752127 threw exception. 
> Failure message was: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -672: java.lang.IllegalArgumentException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8814) --no_auth flag is boolean type and is misleading

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8814?focusedWorklogId=349306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349306
 ]

ASF GitHub Bot logged work on BEAM-8814:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:06
Start Date: 25/Nov/19 20:06
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10202: [BEAM-8814] Changed 
no_auth option from bool to store_true
URL: https://github.com/apache/beam/pull/10202#issuecomment-558319611
 
 
   This would be backward incompatible. I believe that is OK, since use of this 
flag on Dataflow would actually not work anyway and it will only have limited 
test usage.
   
   Should we add a proper help statement as part of this change?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349306)
Remaining Estimate: 167.5h  (was: 167h 40m)
Time Spent: 0.5h  (was: 20m)

> --no_auth flag is boolean type and is misleading
> 
>
> Key: BEAM-8814
> URL: https://issues.apache.org/jira/browse/BEAM-8814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.14.0, 2.15.0, 2.16.0, 2.17.0
> Environment: Python2, Python3
>Reporter: David Song
>Priority: Blocker
> Fix For: 2.18.0
>
>   Original Estimate: 168h
>  Time Spent: 0.5h
>  Remaining Estimate: 167.5h
>
> Pipeline options defines a 
> [no_auth|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L468]]
>  flag that is type=bool. This type is known to be ambiguous because it will 
> expect a value, but anything passed to it will be considered True. For 
> example, passing in "--no_auth=False" would still evaluate to True. We should 
> instead use action="store_true" which only detects whether the flag is passed 
> or not. 
> Furthermore, 
> [PipelineOptions.from_dictionary|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L229]]
>  will assume that boolean flags are passed in without values (e.g. passing 
> --no_auth, instead of --no_auth=True). This, combined with type=bool failing 
> without a value, will ensure that it always fails. 
> sdk_worker_main is the only place that uses from_dictionary (aside from 
> tests), and it will crash if no_auth flag is passed. Looking at 
> pipeline_options_test, tests that call 
> [from_dictionary|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options_test.py#L218]]
>  will feed in get_all_options, which means it have intended to only be used 
> for serializing/deserializing flag options.
> So from here, to support the no_auth flag:
>  * we change no_auth so that it is action="store_true", or
>  * we change sdk_worker_main so that it does not use from_dictionary
> Or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=349305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349305
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 25/Nov/19 20:00
Start Date: 25/Nov/19 20:00
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-558317064
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349305)
Time Spent: 2h 50m  (was: 2h 40m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=349304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349304
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:59
Start Date: 25/Nov/19 19:59
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9915: [BEAM-7746] Add 
python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#issuecomment-558316782
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349304)
Time Spent: 31h 20m  (was: 31h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 31h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981837#comment-16981837
 ] 

Valentyn Tymofieiev commented on BEAM-8651:
---

{quote}Just to add to this, it also affects Python 3 users on Dataflow Python 
with Side-Inputs: https://issues.apache.org/jira/browse/BEAM-8441
{quote}

Since there is dill.load_session in the stacktrace in BEAM-8441, it is likely 
that this is BEAM-6158, see [1] for ways to get unblocked.

[1] 
https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Several Beam users reported an intermittent error which happens during 
> unpickling in StockUnpickler.find_class. A similar error happens consistently 
> when user's pipelines have instances of super() in their main module, and use 
> --save_main_session, see: 
> [BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945].
>  
> In this case the error happens only sometimes, and super() calls don't play a 
> role.  
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8441) Side-Input in Python3 fails to pickle class

2019-11-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981828#comment-16981828
 ] 

Valentyn Tymofieiev commented on BEAM-8441:
---

Since I see "dill.load_session" in stack trace, I suspect you hit BEAM-6158. 
Can you please take a look at [1] and see if this can help address your issue?

[1] 
https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945

> Side-Input in Python3 fails to pickle class
> ---
>
> Key: BEAM-8441
> URL: https://issues.apache.org/jira/browse/BEAM-8441
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Jannik Franz
>Priority: Major
>
> When running Apache Beam with Python3 on Google Cloud Dataflow Sideinputs 
> don't work.
> When testing it in the local/direct runner there seems to be no issue.
>  
>  
> {code:java}
> class FlattenCustomActions(beam.PTransform):
> """ Transforms Facebook Day ActionsOnly retains actions with 
> custom_conversions
> Flattens the actions
> Adds custom conversions names using a side input
> """
> def __init__(self, conversions):
> super(FlattenCustomActions, self).__init__()
> self.conversions = conversionsdef expand(self, input_or_inputs):
> return (
> input_or_inputs
> | "FlattenActions" >> beam.ParDo(flatten_filter_actions)
> | "AddConversionName" >> beam.Map(add_conversion_name, 
> self.conversions)
> )
> # ...
> # in run():
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=pipeline_options)
> conversions_output = (
> p
> | "ReadConversions" >> ReadFromText(known_args.input_conversions, 
> coder=JsonCoder())
> | TransformConversionMetadata()
> )(
> conversions_output
> | "WriteConversions"
> >> WriteCoerced(
> known_args.output_conversions,
> known_args.output_type,
> schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
> )
> )(
> p
> | ReadFacebookJson(known_args.input, retain_root_fields=True)
> | FlattenCustomActions(beam.pvalue.AsList(conversions_output))
> | "WriteActions"
> >> WriteCoerced(
> known_args.output, known_args.output_type, 
> schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
> )
> ){code}
>  
> I receive the following Traceback in Dataflow:
> {code:java}
> Traceback (most recent call last): File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run self._load_main_session(self.local_staging_directory) File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 410, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'FlattenCustomActions' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349276
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:19
Start Date: 25/Nov/19 19:19
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r350374385
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1036,14 +1074,17 @@ class BatchGlobalTriggerDriver(TriggerDriver):
   index=0,
   nonspeculative_index=0)
 
-  def process_elements(self, state, windowed_values, unused_output_watermark):
+  def process_elements(self, state, windowed_values,
+   unused_output_watermark=MIN_TIMESTAMP,
 
 Review comment:
   I meant not provide default (None or otherwise). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349276)
Time Spent: 5h 10m  (was: 5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=349275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349275
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:18
Start Date: 25/Nov/19 19:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349275)
Time Spent: 20h 20m  (was: 20h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=349274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349274
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:17
Start Date: 25/Nov/19 19:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349839904
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -477,6 +479,59 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+with TestPipeline() as pipeline:
+
+  # Create a PCollection and assign each element with a different 
timestamp.
+  before_reshuffle = (pipeline
+  | beam.Create([
+  {'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+  {'name': 'foo', 'timestamp': 0},
+  {'name': 'bar', 'timestamp': 33},
+  {'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+  ])
+  | beam.Map(
+  lambda element: beam.window.TimestampedValue(
+  element, element['timestamp'])))
+
+  # Reshuffle the PCollection above and assign the timestamp of an element
+  # to that element again.
+  after_reshuffle = before_reshuffle | beam.Reshuffle()
+
+  # Given an element, emits a string which contains the timestamp and the
+  # name field of the element.
+  def format_with_timestamp(element, timestamp=beam.DoFn.TimestampParam):
+t = str(timestamp)
 
 Review comment:
   Nit: I'd put this in an else clause below. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349274)
Time Spent: 20h 10m  (was: 20h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349272
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:01
Start Date: 25/Nov/19 19:01
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558294511
 
 
   > My bad if we cleared this during review, but an Aggregation is also a 
PTransform. The diagrams seem to imply that they're different. Maybe an 
'Aggregating PTransform', or a 'Grouping PTransform', vs a 'per-element 
PTransform'.
   
   Good catch - thanks. I like "Aggregating PTransform" since it's also 
consistent with the language in the transform catalog. Will update the legends 
accordingly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349272)
Time Spent: 0.5h  (was: 20m)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?focusedWorklogId=349270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349270
 ]

ASF GitHub Bot logged work on BEAM-8664:


Author: ASF GitHub Bot
Created on: 25/Nov/19 19:00
Start Date: 25/Nov/19 19:00
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10095: [BEAM-8664] [SQL] 
MongoDb project push down
URL: https://github.com/apache/beam/pull/10095#issuecomment-558293922
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349270)
Time Spent: 0.5h  (was: 20m)

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=349268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349268
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 25/Nov/19 18:58
Start Date: 25/Nov/19 18:58
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10107: 
[BEAM-8335] Change has_unbounded_sources to predetermined list of sources
URL: https://github.com/apache/beam/pull/10107#discussion_r350364684
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -24,13 +24,27 @@
 from __future__ import absolute_import
 
 import apache_beam as beam
+from apache_beam.io.external.gcp.pubsub import ReadFromPubSub as 
ExternalReadFromPubSub
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.gcp.bigquery_tools import BigQueryReader
+from apache_beam.io.gcp.pubsub import ReadFromPubSub
 from apache_beam.pipeline import PipelineVisitor
 from apache_beam.runners.interactive import cache_manager as cache
 from apache_beam.runners.interactive import interactive_environment as ie
 
 READ_CACHE = "_ReadCache_"
 WRITE_CACHE = "_WriteCache_"
 
+# Use a tuple to define the list of unbounded sources. It is not always 
feasible
+# to correctly find all the unbounded sources in the SDF world. This is
+# because SDF allows the source to dynamically create sources at runtime.
+REPLACEABLE_UNBOUNDED_SOURCES = (
+ExternalReadFromPubSub,
+ReadFromKafka,
+ReadFromPubSub,
+BigQueryReader,
 
 Review comment:
   kk, removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349268)
Time Spent: 34h 10m  (was: 34h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 34h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=349265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349265
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 25/Nov/19 18:42
Start Date: 25/Nov/19 18:42
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9926: [BEAM-7390] Add 
code snippet for GroupByKey
URL: https://github.com/apache/beam/pull/9926#issuecomment-558287027
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349265)
Time Spent: 4.5h  (was: 4h 20m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=349266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349266
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 25/Nov/19 18:42
Start Date: 25/Nov/19 18:42
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9926: [BEAM-7390] Add 
code snippet for GroupByKey
URL: https://github.com/apache/beam/pull/9926#issuecomment-558287027
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349266)
Time Spent: 4h 40m  (was: 4.5h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349262
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 25/Nov/19 18:37
Start Date: 25/Nov/19 18:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558285051
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349262)
Time Spent: 1h 20m  (was: 1h 10m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >