[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317173 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 05:03 Start Date: 24/Sep/19 05:03 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534389982 > We're still doing the same amount of work, so IIUC, assuming we get similar CPU-utilization in this new configuration, these 5 jobs should finish in the time it took the previous single job to finish, plus whatever overhead is required per job to bootstrap the tests. I think the concern is that we may have higher test concurrency since we splitting the big python job into multiple pieces. The Jenkins job waiting queue cloud be longer. In addition, this may also lead higher stress on gcp resources usage, such as CreateJobPerMinutePerUser, and computing engine quotas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317173) Time Spent: 1h 40m (was: 1.5h) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317151 ] ASF GitHub Bot logged work on BEAM-6923: Author: ASF GitHub Bot Created on: 24/Sep/19 04:15 Start Date: 24/Sep/19 04:15 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9647: [BEAM-6923] limit number of concurrent artifact write to 8 URL: https://github.com/apache/beam/pull/9647#discussion_r327420681 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -77,6 +78,13 @@ public static final String MANIFEST = "MANIFEST"; public static final String ARTIFACTS = "artifacts"; + private final Semaphore permittedConcurrentWrite; + + public BeamFileSystemArtifactStagingService() { +super(); +permittedConcurrentWrite = new Semaphore(8); Review comment: Maybe make the `8` a constant? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317151) Time Spent: 0.5h (was: 20m) > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317150 ] ASF GitHub Bot logged work on BEAM-6923: Author: ASF GitHub Bot Created on: 24/Sep/19 04:04 Start Date: 24/Sep/19 04:04 Worklog Time Spent: 10m Work Description: angoenka commented on issue #9647: [BEAM-6923] limit number of concurrent artifact write to 8 URL: https://github.com/apache/beam/pull/9647#issuecomment-534378966 R: @robertwb @pabloem @lgajowy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317150) Time Spent: 20m (was: 10m) > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > Time Spent: 20m > Remaining Estimate: 0h > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317149 ] ASF GitHub Bot logged work on BEAM-6923: Author: ASF GitHub Bot Created on: 24/Sep/19 04:03 Start Date: 24/Sep/19 04:03 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9647: [BEAM-6923] limit number of concurrent artifact write to 8 URL: https://github.com/apache/beam/pull/9647 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/jo
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317127 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 24/Sep/19 01:56 Start Date: 24/Sep/19 01:56 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-534354241 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317127) Time Spent: 12h 40m (was: 12.5h) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936291#comment-16936291 ] Ankur Goenka edited comment on BEAM-6923 at 9/24/19 1:40 AM: - Gcsutil.java use sets the default buffer size for individual file write to be 64MB when the VM memory is more than 1GB. Artifact staging tend to upload multiple files in parallel and each upload reserves 64MB causing this issue. Couple of potential fixes are, # Limiting concurrent upload of files to a lower number. # Limit the gcs util buffer size per file. # Limit concurrent gcs connections so that it applies to all the file uploads. 1 applies only to artifact staging but theoretically this problem can impact a pipeline which writes to a bunch of files. 2 has a performance penalty when writing to a single file. 3 applies to all the files but can lead to cases where we keep a file open for long time in pipeline processing I am in favor of 1 as the impact will be limited to artifact staging. cc: [~robertwb] was (Author: angoenka): Gcsutil.java use sets the default buffer size for individual file write to be 64MB when the VM memory is more than 1GB. Artifact staging tend to upload multiple files in parallel and each upload reserves 64MB causing this issue. Couple of potential fixes are, # Limiting concurrent upload of files to a lower number. # Limit the gcs util buffer size per file. # Limit concurrent gcs connections so that it applies to all the file uploads. 1 applies only to artifact staging but theoretically this problem can impact a pipeline which writes to a bunch of files. 2 has a performance penalty when writing to a single file. I am in favor of 3 as it should not have any performance penalty and applies to all the gcs related file IO. cc: [~robertwb] > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936291#comment-16936291 ] Ankur Goenka commented on BEAM-6923: Gcsutil.java use sets the default buffer size for individual file write to be 64MB when the VM memory is more than 1GB. Artifact staging tend to upload multiple files in parallel and each upload reserves 64MB causing this issue. Couple of potential fixes are, # Limiting concurrent upload of files to a lower number. # Limit the gcs util buffer size per file. # Limit concurrent gcs connections so that it applies to all the file uploads. 1 applies only to artifact staging but theoretically this problem can impact a pipeline which writes to a bunch of files. 2 has a performance penalty when writing to a single file. I am in favor of 3 as it should not have any performance penalty and applies to all the gcs related file IO. cc: [~robertwb] > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317117 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327393965 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. Review comment: This sentence is unclear-- Do you mean that if you add extra dependencies or serialization files, then you don't need to supply them again later? What does it mean that "the execution engine doesn't need them"? What is "them" in this sentence? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317117) Time Spent: 1h 20m (was: 1h 10m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317119 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327392732 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker Review comment: (I realize this comment might look weird because it gets rendered, but see if you can view the source) I would use code fences instead of the pre ... tags here. I don't see the white/grey background on the Beam site for other code boxes, so maybe change this to remain consistent. Then, you need to rewrite this list in markdown 1. Navigate to your local copy of https://github.com/apache/beam";>beam 1. Run Gradle with the `docker` target: ``` ./gradlew docker ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317119) Time Spent: 1.5h (was: 1h 20m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317112 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327393049 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + Review comment: Missing or extra word "Navigate to your local copy of [the]... " This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317112) Time Spent: 1h (was: 50m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317111 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327389718 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: Review comment: "may"->"might" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317111) Time Spent: 50m (was: 40m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317115 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327393615 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: +* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original +* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container + +It's often easier to write a new Dockerfile, but you can customize anything, including the base OS, by modifying the original. Review comment: "It's often easier to write a new Dockerfile, but by modifying the original Dockerfile, you can customize anything (including the base OS) ." Just a suggestion--I don't really like parentheses, but this sentence is confusing with too many clauses. Another possibility is to break it up. "It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything, including the base OS." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317115) Time Spent: 1h 10m (was: 1h) > Document c
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317113 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327390032 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. Review comment: I suggest simplifying "an eponymous repository", maybe "a repository of the same name" ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317113) Time Spent: 1h (was: 50m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317116 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327391188 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: +* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original +* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container + +It's often easier to write a new Dockerfile, but you can customize anything, including the base OS, by modifying the original. + +### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + + + +Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt SDK container image for your target language and version. + + +https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent + + +Build a child image: docker build -f /path/to/new/Dockerfile + + + +### Modifying the original Dockerfile {#modifying-dockerfiles} + +1. Pull the [prebuilt SDK container image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release) for your target language and version Review comment: add periods at the
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317120 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327394115 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: Review comment: add periods at the end of these sentences This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317120) Time Spent: 1h 40m (was: 1.5h) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317109 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327388352 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. Review comment: This sentence is confusing and might need to be split. Did you mean: "The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317109) Time Spent: 0.5h (was: 20m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317110 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327387167 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: Review comment: I don't understand this redirect--is there more context/rationale? If someone had previously bookmarked the Beam Execution Model page, shouldn't they be redirected to /documentation/runtime/model/ page now? i.e. this redirect should be in the other file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317110) Time Spent: 40m (was: 0.5h) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317118 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327391740 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: +* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original +* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container + +It's often easier to write a new Dockerfile, but you can customize anything, including the base OS, by modifying the original. + +### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + + + +Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt SDK container image for your target language and version. + + +https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent + + +Build a child image: docker build -f /path/to/new/Dockerfile + + + +### Modifying the original Dockerfile {#modifying-dockerfiles} + +1. Pull the [prebuilt SDK container image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release) for your target language and version +2. Customize the [Dockerfile](https://
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317114 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327390222 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: Review comment: "like:"->"like the following:" or "such as the following: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317114) Time Spent: 1h 10m (was: 1h) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317108 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 01:31 Start Date: 24/Sep/19 01:31 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r327387472 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. Review comment: "beacuse"->"because" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317108) Time Spent: 0.5h (was: 20m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317104 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 01:10 Start Date: 24/Sep/19 01:10 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534344709 > Main concern: this change may increase precommit queue, since each job will require a jenkins slot, of which we have 16 VMs * 2 slots per VM. What required 1 slot will now require 5. We're still doing the same amount of work, so IIUC, assuming we get similar CPU-utilization in this new configuration, these 5 jobs should finish in the time it took the previous single job to finish, plus whatever overhead is required per job to bootstrap the tests. The previous job was taking 75 minutes for me, so I'm hoping that the per-job overhead is relatively small in comparison (e.g. if bootstrap time is 1 minute per job, adding an extra 4 minutes for 4 more jobs is ~5% increase). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317104) Time Spent: 1.5h (was: 1h 20m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8233) Separate loopback and docker modes on Flink runner guide
[ https://issues.apache.org/jira/browse/BEAM-8233?focusedWorklogId=317103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317103 ] ASF GitHub Bot logged work on BEAM-8233: Author: ASF GitHub Bot Created on: 24/Sep/19 01:08 Start Date: 24/Sep/19 01:08 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9605: [BEAM-8233] [BEAM-8214] [BEAM-8232] Document environment_type flag URL: https://github.com/apache/beam/pull/9605#issuecomment-534344275 @tweise I resolved the jiras. (I try to go through every once in a while and clean out the ones I forgot to close.) I'll be sure to tag you on future related PRs -- and yes, we can wait longer for review, especially for this variety of non-pressing documentation change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317103) Time Spent: 3h (was: 2h 50m) > Separate loopback and docker modes on Flink runner guide > > > Key: BEAM-8233 > URL: https://issues.apache.org/jira/browse/BEAM-8233 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > Running loopback should be the "getting started" option, and docker mode > should be an "advanced" option with its own section of the Flink runner guide > with instructions and explanations (you need to build the docker container > images, you can't see your output in a local filesystem without > workarounds..) [https://beam.apache.org/documentation/runners/flink/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7933) Adding timeout to JobServer grpc calls
[ https://issues.apache.org/jira/browse/BEAM-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936278#comment-16936278 ] Enrico Canzonieri commented on BEAM-7933: - Yes, I'm planning to work on this. It shouldn't take me too long to get a pr out. I should have some time by the end of this week. > Adding timeout to JobServer grpc calls > -- > > Key: BEAM-7933 > URL: https://issues.apache.org/jira/browse/BEAM-7933 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.14.0 >Reporter: Enrico Canzonieri >Assignee: Enrico Canzonieri >Priority: Minor > Labels: portability > > grpc calls to the JobServer from the Python SDK do not have timeouts. That > means that the call to pipeline.run()could hang forever if the JobServer is > not running (or failing to start). > E.g. > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307] > the call to Prepare() doesn't provide any timeout value and the same applies > to other JobServer requests. > As part of this ticket we could add a default timeout of 60 seconds as the > default timeout for http client. > Additionally, we could consider adding a --job-server-request-timeout to the > [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805] > class to be used in the JobServer interactions inside probable_runner.py. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=317099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317099 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 24/Sep/19 00:49 Start Date: 24/Sep/19 00:49 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9637: [release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9637#issuecomment-534340488 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317099) Time Spent: 1h 20m (was: 1h 10m) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8293) Document or log file system issues with docker
[ https://issues.apache.org/jira/browse/BEAM-8293?focusedWorklogId=317098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317098 ] ASF GitHub Bot logged work on BEAM-8293: Author: ASF GitHub Bot Created on: 24/Sep/19 00:49 Start Date: 24/Sep/19 00:49 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9646: [BEAM-8293] prescriptive log message for artifact retrieval failure URL: https://github.com/apache/beam/pull/9646 Also boosted the level for higher visibility. R: @robertwb Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317096 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 00:45 Start Date: 24/Sep/19 00:45 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534339765 /cc @youngoli on the last comment. Daniel was pointing out to long queue times due to the increase in number jobs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317096) Time Spent: 1h 20m (was: 1h 10m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317095 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 00:43 Start Date: 24/Sep/19 00:43 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534339420 Overall, this LGTM. We need to make sure that trigger phrases work, are visible, and by default all precommits run on python PRs . Main concern: this change may increase precommit queue, since each job will require a jenkins slot, of which we have 16 VMs * 2 slots per VM. What required 1 slot will now require 5. @yifanzou what's your take on this? We may want to increase the amount of slots, and should monitor the precommit queue time after this is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317095) Time Spent: 1h 10m (was: 1h) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317086 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 00:32 Start Date: 24/Sep/19 00:32 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534337307 To run a seed job on your PR you can say "Run Seed Jøb" (using ø to avoid triggering the job by this comment). After seed job finishes (~10 min), run you can run jenkins jobs defined in this PR. Note that seed job launched on the PR will affect other test executions, so consider giving a heads-up on dev. Seed job runs periodically, so at some point we will restore to the job specs using SOT in master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317086) Time Spent: 1h (was: 50m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8214) Remove vestigial docker commands from portability instructions
[ https://issues.apache.org/jira/browse/BEAM-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8214. --- Fix Version/s: Not applicable Resolution: Fixed > Remove vestigial docker commands from portability instructions > -- > > Key: BEAM-8214 > URL: https://issues.apache.org/jira/browse/BEAM-8214 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > > Right now [https://beam.apache.org/roadmap/portability/] contains docker > commands which are useless, as we are using loopback mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8232) Document LOOPBACK environment type
[ https://issues.apache.org/jira/browse/BEAM-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8232. --- Fix Version/s: Not applicable Resolution: Fixed > Document LOOPBACK environment type > -- > > Key: BEAM-8232 > URL: https://issues.apache.org/jira/browse/BEAM-8232 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > > * Right now, LOOPBACK is not mentioned as a possible option for > environment_type [1]. It seems that it was intended for testing [2], but it's > useful for getting started on Beam and debugging as well, so it's worth a > mention. > * The meaning of each environment type should be documented somewhere, such > as on the runner pipeline options tables on the website. > > [1] > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L818-L819] > [2] > [https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L82] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8233) Separate loopback and docker modes on Flink runner guide
[ https://issues.apache.org/jira/browse/BEAM-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8233. --- Fix Version/s: Not applicable Resolution: Fixed > Separate loopback and docker modes on Flink runner guide > > > Key: BEAM-8233 > URL: https://issues.apache.org/jira/browse/BEAM-8233 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Running loopback should be the "getting started" option, and docker mode > should be an "advanced" option with its own section of the Flink runner guide > with instructions and explanations (you need to build the docker container > images, you can't see your output in a local filesystem without > workarounds..) [https://beam.apache.org/documentation/runners/flink/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus
[ https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317079 ] ASF GitHub Bot logged work on BEAM-8131: Author: ASF GitHub Bot Created on: 24/Sep/19 00:24 Start Date: 24/Sep/19 00:24 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9482: [BEAM-8131] Provide Kubernetes setup for Prometheus URL: https://github.com/apache/beam/pull/9482#discussion_r327381413 ## File path: .test-infra/metrics/prometheus/prometheus/config/rules.yml ## @@ -0,0 +1,30 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +groups: +- name: beamTests + rules: + - alert: TestRegression +expr: ((avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[1d]) + - avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[6d] offset 1d)) + / avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[6d] offset 1d)) + > 0.2 +labels: + job: beamAlert +annotations: + summary: 'Average runtime over 24 hours is 20% greater than average from six previous days' Review comment: Have you verified that this formula is appropriate? It may be good to formulate it in terms of the standard deviation? I'm not sure. Just thinking out loud. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317079) Time Spent: 2h 50m (was: 2h 40m) > Provide Kubernetes setup with Prometheus > > > Key: BEAM-8131 > URL: https://issues.apache.org/jira/browse/BEAM-8131 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus
[ https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317081 ] ASF GitHub Bot logged work on BEAM-8131: Author: ASF GitHub Bot Created on: 24/Sep/19 00:24 Start Date: 24/Sep/19 00:24 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9482: [BEAM-8131] Provide Kubernetes setup for Prometheus URL: https://github.com/apache/beam/pull/9482#discussion_r327380673 ## File path: .test-infra/metrics/prometheus/alertmanager/config/alertmanager.yml ## @@ -0,0 +1,37 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + Review comment: Would you add comments with explanations for what each file does? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317081) Time Spent: 2h 50m (was: 2h 40m) > Provide Kubernetes setup with Prometheus > > > Key: BEAM-8131 > URL: https://issues.apache.org/jira/browse/BEAM-8131 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus
[ https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317080 ] ASF GitHub Bot logged work on BEAM-8131: Author: ASF GitHub Bot Created on: 24/Sep/19 00:24 Start Date: 24/Sep/19 00:24 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9482: [BEAM-8131] Provide Kubernetes setup for Prometheus URL: https://github.com/apache/beam/pull/9482#discussion_r327380834 ## File path: .test-infra/metrics/docker-compose.yml ## @@ -86,9 +86,35 @@ services: - DB_DBNAME=beam_metrics - DB_DBUSERNAME=admin - DB_DBPWD= + prometheus: Review comment: I feel silly, but why do we need docker-compose configuration, and kubernetes configuration? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317080) Time Spent: 2h 50m (was: 2h 40m) > Provide Kubernetes setup with Prometheus > > > Key: BEAM-8131 > URL: https://issues.apache.org/jira/browse/BEAM-8131 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317078 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 24/Sep/19 00:21 Start Date: 24/Sep/19 00:21 Worklog Time Spent: 10m Work Description: rosetn commented on issue #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#issuecomment-534335284 STAGED: http://apache-beam-website-pull-requests.storage.googleapis.com/9607/documentation/runtime/environments/index.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317078) Time Spent: 20m (was: 10m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317077 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 00:19 Start Date: 24/Sep/19 00:19 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534334710 I believe you need to run a seed job to get the new jobs recognized by Jenkins. R: @yifanzou could help with the specifics of seed job. R: @tvalentyn for reviewing wrt to python 3 jobs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317077) Time Spent: 50m (was: 40m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317076 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 24/Sep/19 00:19 Start Date: 24/Sep/19 00:19 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534334710 I believe you need to run a seed job to get the new jobs recognized by Jenkins. R: @yifanmai could help with the specifics of seed job. R: @tvalentyn for reviewing wrt to python 3 jobs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317076) Time Spent: 40m (was: 0.5h) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8256) Set fixed number of workers for File-based IOITs
[ https://issues.apache.org/jira/browse/BEAM-8256?focusedWorklogId=317061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317061 ] ASF GitHub Bot logged work on BEAM-8256: Author: ASF GitHub Bot Created on: 24/Sep/19 00:16 Start Date: 24/Sep/19 00:16 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9596: [BEAM-8256] Set fixed number of workers for Java IOITs URL: https://github.com/apache/beam/pull/9596#discussion_r327381844 ## File path: .test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy ## @@ -28,7 +28,9 @@ def jobs = [ pipelineOptions: [ bigQueryDataset: 'beam_performance', bigQueryTable : 'textioit_results', -numberOfRecords: '100' +numberOfRecords: '100', +maxNumWorkers : '5', Review comment: So to conclude: I agree with @lgajowy 's suggestion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317061) Time Spent: 1h (was: 50m) > Set fixed number of workers for File-based IOITs > > > Key: BEAM-8256 > URL: https://issues.apache.org/jira/browse/BEAM-8256 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Autoscaling is a feature of google cloud dataflow runner that adds/removes > worker nodes dynamically as the job runs. It can behave in a different way > creating different test (runtime) results in consequent runs. In integration > tests (such as IOIT but others also apply) we don't need such nondeterminism > and it's best to have a fixed number of workers for every test execution. > IOITs use autoscaling but they shouldn't. This issue was created to disable > it and set a fixed number of workers. > Side note: autoscaling is already disabled in Nexmark and load tests of core > operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8160) Add instructions about how to set FnApi multi-threads/processes
[ https://issues.apache.org/jira/browse/BEAM-8160?focusedWorklogId=317060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317060 ] ASF GitHub Bot logged work on BEAM-8160: Author: ASF GitHub Bot Created on: 24/Sep/19 00:15 Start Date: 24/Sep/19 00:15 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9628: [BEAM-8160] Add FnApi execution mode instruction URL: https://github.com/apache/beam/pull/9628 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317060) Time Spent: 40m (was: 0.5h) > Add instructions about how to set FnApi multi-threads/processes > --- > > Key: BEAM-8160 > URL: https://issues.apache.org/jira/browse/BEAM-8160 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Add instructions to Beam site or Beam wiki for easy discovery. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317057 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 24/Sep/19 00:09 Start Date: 24/Sep/19 00:09 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-534332723 Run Dataflow Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317057) Time Spent: 12.5h (was: 12h 20m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 12.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317055 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 24/Sep/19 00:09 Start Date: 24/Sep/19 00:09 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-534332680 Run Spark Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317055) Time Spent: 12h 20m (was: 12h 10m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317054 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 24/Sep/19 00:09 Start Date: 24/Sep/19 00:09 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-534332619 Run Direct Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317054) Time Spent: 12h 10m (was: 12h) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317053 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 24/Sep/19 00:08 Start Date: 24/Sep/19 00:08 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-534332583 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317053) Time Spent: 12h (was: 11h 50m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek He updated BEAM-8306: --- Description: ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We expect it can be more accurate to split it base on query result size. Currently, we have a big Elasticsearch index. But for query result, it only contains a few documents in the index. ElasticsearchIO splits it into up to1024 BoundedSources in Google dataflow. It takes long time to finish the processing the small numbers of Elasticsearch document in Google dataflow. was: ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We expect it can be more accurate to split it base on query result size. Currently, we have a big Elasticsearch index. But for query result, it only contains a few documents in the index. But ElasticsearchIO splits it into up to1024 BoundedSources in Google dataflow. It takes long time to finish the processing the small numbers of Elasticsearch document in Google dataflow. > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Priority: Major > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek He updated BEAM-8306: --- Component/s: (was: sdk-java-core) io-java-elasticsearch > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Priority: Major > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. But ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
Derek He created BEAM-8306: -- Summary: improve estimation of data byte size reading from source in ElasticsearchIO Key: BEAM-8306 URL: https://issues.apache.org/jira/browse/BEAM-8306 Project: Beam Issue Type: Improvement Components: sdk-java-core Affects Versions: 2.14.0 Reporter: Derek He ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We expect it can be more accurate to split it base on query result size. Currently, we have a big Elasticsearch index. But for query result, it only contains a few documents in the index. But ElasticsearchIO splits it into up to1024 BoundedSources in Google dataflow. It takes long time to finish the processing the small numbers of Elasticsearch document in Google dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override
[ https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=317011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317011 ] ASF GitHub Bot logged work on BEAM-8240: Author: ASF GitHub Bot Created on: 23/Sep/19 22:24 Start Date: 23/Sep/19 22:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9629: [BEAM-8240] Sets workerHarnessContainerImage in the default Environment of DataflowRunner URL: https://github.com/apache/beam/pull/9629 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 317011) Time Spent: 4h 40m (was: 4.5h) > Fix pipeline proto to contain worker_harness_container_image override > - > > Key: BEAM-8240 > URL: https://issues.apache.org/jira/browse/BEAM-8240 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > SDK harness incorrectly identifies itself when using custom SDK container > within environment field when building pipeline proto. > > Passing in the experiment *worker_harness_container_image=YYY* doesn't > override the pipeline proto environment field and it is still being populated > with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802* > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8305) Cleanup external transform tests
Robert Bradshaw created BEAM-8305: - Summary: Cleanup external transform tests Key: BEAM-8305 URL: https://issues.apache.org/jira/browse/BEAM-8305 Project: Beam Issue Type: Bug Components: testing Reporter: Robert Bradshaw Currently apache_beam/transforms/external_test.py has several entry points, sometimes called directly, sometimes via nosetest, sometimes with parameters passed via arguments or via environment variables, and the logic is not always clear to follow (either within the test, or via the several gradle targets that reference it). We should really let this file be a unit test, and create a different script (sharing a common library if needed) in our integration tests. This was the root cause of BEAM-8302 . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316999 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 22:06 Start Date: 23/Sep/19 22:06 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644#issuecomment-534304066 R: @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316999) Time Spent: 1h (was: 50m) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8302 > URL: https://issues.apache.org/jira/browse/BEAM-8302 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316996 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 22:01 Start Date: 23/Sep/19 22:01 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644#issuecomment-534302437 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316996) Time Spent: 50m (was: 40m) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8302 > URL: https://issues.apache.org/jira/browse/BEAM-8302 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316995 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 22:00 Start Date: 23/Sep/19 22:00 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644#issuecomment-534302393 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316995) Time Spent: 40m (was: 0.5h) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8302 > URL: https://issues.apache.org/jira/browse/BEAM-8302 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316993 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 21:58 Start Date: 23/Sep/19 21:58 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644#issuecomment-534301585 Run PostCommit_XVR_Flink This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316993) Time Spent: 0.5h (was: 20m) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8302 > URL: https://issues.apache.org/jira/browse/BEAM-8302 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316991 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 21:57 Start Date: 23/Sep/19 21:57 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://bui
[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316992 ] ASF GitHub Bot logged work on BEAM-8302: Author: ASF GitHub Bot Created on: 23/Sep/19 21:57 Start Date: 23/Sep/19 21:57 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix PostCommit_XVR_Flink URL: https://github.com/apache/beam/pull/9644#issuecomment-534301471 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316992) Time Spent: 20m (was: 10m) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8302 > URL: https://issues.apache.org/jira/browse/BEAM-8302 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936210#comment-16936210 ] Ankur Goenka commented on BEAM-6923: Also able to reproduce on linux my setting XMX for job server process to be 1GB > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936197#comment-16936197 ] Ankur Goenka commented on BEAM-6923: ACK: I am able to reproduce it on MAC Environment: Java 1.8 Flink 1.5.6 goenka@goenka-macbookpro:~/d/work/beam/beam$ ./gradlew runners:flink:1.5:job-server:runShadow -PflinkMasterUrl=localhost:8081 -PartifactsDir="gs://clouddfe-goenka/tmp/t0" Configuration on demand is an incubating feature. > Task :runners:flink:1.5:job-server:runShadow Listening for transport dt_socket at address: 5005 [main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - ArtifactStagingService started on localhost:8098 [main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - Java ExpansionService started on localhost:8097 [main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - JobService started on localhost:8099 Exception in thread "grpc-default-executor-70" java.lang.OutOfMemoryError: Java heap space at com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Exception in thread "grpc-default-executor-146" java.lang.OutOfMemoryError: Java heap space at com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Exception in thread "grpc-default-executor-50" java.lang.OutOfMemoryError: Java heap space Exception in thread "grpc-default-executor-23" java.lang.OutOfMemoryError: Java heap space > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUp
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316981 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 21:18 Start Date: 23/Sep/19 21:18 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534289161 Run Apex ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316981) Time Spent: 1h 40m (was: 1.5h) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316977 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 21:09 Start Date: 23/Sep/19 21:09 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534285609 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316977) Time Spent: 1.5h (was: 1h 20m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8111) SchemaCoder broken on DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8111?focusedWorklogId=316978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316978 ] ASF GitHub Bot logged work on BEAM-8111: Author: ASF GitHub Bot Created on: 23/Sep/19 21:09 Start Date: 23/Sep/19 21:09 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9446: [BEAM-8111] Enable CloudObjectsTest$DefaultCoders URL: https://github.com/apache/beam/pull/9446#discussion_r327330801 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java ## @@ -100,4 +99,47 @@ public boolean consistentWithEquals() { public String toString() { return "SchemaCoder: " + rowCoder.toString(); } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +SchemaCoder that = (SchemaCoder) o; +return rowCoder.equals(that.rowCoder) +&& toRowFunction.equals(that.toRowFunction) +&& fromRowFunction.equals(that.fromRowFunction); Review comment: I have a PR up now (https://github.com/apache/beam/pull/9493) that adds `equals` and `hashCode` to the `fromRow` and `toRow` functions created by all the `GetterBasedSchemaProvider` sub-classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316978) Time Spent: 4h (was: 3h 50m) > SchemaCoder broken on DataflowRunner > > > Key: BEAM-8111 > URL: https://issues.apache.org/jira/browse/BEAM-8111 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 4h > Remaining Estimate: 0h > > https://github.com/apache/beam/commit/e65c176a9f34e45d408281e1101a2ae54cef0f6c > broke SchemaCoder on Dataflow. When translating a schema that uses logical > types from a cloud object dataflow encounters a runtime error. > This means any pipelines that use SqlTransform or schema transforms will fail > on Dataflow in 2.15.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316976 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 21:08 Start Date: 23/Sep/19 21:08 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534285319 R: @reuvenlax I know you said Flink/Apex shouldn't be relying on coder equality and that should be the real fix for BEAM-8204 and BEAM-8205, but I think this is helpful for testing anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316976) Time Spent: 1h 20m (was: 1h 10m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override
[ https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=316974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316974 ] ASF GitHub Bot logged work on BEAM-8240: Author: ASF GitHub Bot Created on: 23/Sep/19 21:02 Start Date: 23/Sep/19 21:02 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9629: [BEAM-8240] Sets workerHarnessContainerImage in the default Environment of DataflowRunner URL: https://github.com/apache/beam/pull/9629#issuecomment-534283269 Moved the test. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316974) Time Spent: 4.5h (was: 4h 20m) > Fix pipeline proto to contain worker_harness_container_image override > - > > Key: BEAM-8240 > URL: https://issues.apache.org/jira/browse/BEAM-8240 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > SDK harness incorrectly identifies itself when using custom SDK container > within environment field when building pipeline proto. > > Passing in the experiment *worker_harness_container_image=YYY* doesn't > override the pipeline proto environment field and it is still being populated > with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802* > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316972 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 20:50 Start Date: 23/Sep/19 20:50 Worklog Time Spent: 10m Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#issuecomment-534279086 Run Python MongoDBIO_IT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316972) Time Spent: 2h 20m (was: 2h 10m) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8204) Newly added Java ValidatesRunner tests failed on ApexRunner
[ https://issues.apache.org/jira/browse/BEAM-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936167#comment-16936167 ] Brian Hulette edited comment on BEAM-8204 at 9/23/19 8:43 PM: -- I think https://github.com/apache/beam/pull/9493 will resolve the issue with the ValidatesRunner test on Flink and Apex, but not for the different side inputs. I made a separate bug for that: BEAM-8304 was (Author: bhulette): I think https://github.com/apache/beam/pull/9493 will resolve the issue with the ValidatesRunner test on Flink and Apex, but not for the different side inputs. I'll make separate bug for that. > Newly added Java ValidatesRunner tests failed on ApexRunner > --- > > Key: BEAM-8204 > URL: https://issues.apache.org/jira/browse/BEAM-8204 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Yueyang Qiu >Assignee: Brian Hulette >Priority: Major > Labels: currently-failing > Time Spent: 2h > Remaining Estimate: 0h > > Jenkins link: > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/testReport/] > > Initial investigation: > [https://github.com/apache/beam/pull/9454] and > [https://github.com/apache/beam/pull/9372] added new ValidatesRunner tests. > They have been tested on Dataflow runner, but are failing on Apex runner. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316965 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 20:39 Start Date: 23/Sep/19 20:39 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534274859 Run Apex ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316965) Time Spent: 1h (was: 50m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316966 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 20:39 Start Date: 23/Sep/19 20:39 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534274910 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316966) Time Spent: 1h 10m (was: 1h) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316964 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 20:38 Start Date: 23/Sep/19 20:38 Worklog Time Spent: 10m Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#issuecomment-534274330 Run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316964) Time Spent: 2h 10m (was: 2h) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8304) Apex runner dosn't support multiple side inputs with different coders
Brian Hulette created BEAM-8304: --- Summary: Apex runner dosn't support multiple side inputs with different coders Key: BEAM-8304 URL: https://issues.apache.org/jira/browse/BEAM-8304 Project: Beam Issue Type: Bug Components: runner-apex Affects Versions: 2.16.0 Reporter: Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8204) Newly added Java ValidatesRunner tests failed on ApexRunner
[ https://issues.apache.org/jira/browse/BEAM-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936167#comment-16936167 ] Brian Hulette commented on BEAM-8204: - I think https://github.com/apache/beam/pull/9493 will resolve the issue with the ValidatesRunner test on Flink and Apex, but not for the different side inputs. I'll make separate bug for that. > Newly added Java ValidatesRunner tests failed on ApexRunner > --- > > Key: BEAM-8204 > URL: https://issues.apache.org/jira/browse/BEAM-8204 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Yueyang Qiu >Assignee: Brian Hulette >Priority: Major > Labels: currently-failing > Time Spent: 2h > Remaining Estimate: 0h > > Jenkins link: > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/testReport/] > > Initial investigation: > [https://github.com/apache/beam/pull/9454] and > [https://github.com/apache/beam/pull/9372] added new ValidatesRunner tests. > They have been tested on Dataflow runner, but are failing on Apex runner. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.
[ https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=316952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316952 ] ASF GitHub Bot logged work on BEAM-8301: Author: ASF GitHub Bot Created on: 23/Sep/19 20:07 Start Date: 23/Sep/19 20:07 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9641: [BEAM-8301] Fix incomparable defaults. URL: https://github.com/apache/beam/pull/9641#issuecomment-534262692 R: @markflyhigh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316952) Remaining Estimate: 0h Time Spent: 10m > Argument inference breaks on incomparable types as defaults. > > > Key: BEAM-8301 > URL: https://issues.apache.org/jira/browse/BEAM-8301 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0 >Reporter: Robert Bradshaw >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 10m > Remaining Estimate: 0h > > A common culprit is numpy arrays, e.g. > {code:python} > class MyDoFn(beam.DoFn): > def process(element, arg=np.ndarray(...)): > ... > {code} > This bug was introduced as part of [BEAM-7060]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7933) Adding timeout to JobServer grpc calls
[ https://issues.apache.org/jira/browse/BEAM-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936154#comment-16936154 ] Kyle Weaver commented on BEAM-7933: --- I think this would be a useful feature, especially for common failure modes such as pipeline submission. Do you still plan on implementing this [~enricoc]? > Adding timeout to JobServer grpc calls > -- > > Key: BEAM-7933 > URL: https://issues.apache.org/jira/browse/BEAM-7933 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.14.0 >Reporter: Enrico Canzonieri >Assignee: Enrico Canzonieri >Priority: Minor > Labels: portability > > grpc calls to the JobServer from the Python SDK do not have timeouts. That > means that the call to pipeline.run()could hang forever if the JobServer is > not running (or failing to start). > E.g. > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307] > the call to Prepare() doesn't provide any timeout value and the same applies > to other JobServer requests. > As part of this ticket we could add a default timeout of 60 seconds as the > default timeout for http client. > Additionally, we could consider adding a --job-server-request-timeout to the > [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805] > class to be used in the JobServer interactions inside probable_runner.py. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8029) Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation
[ https://issues.apache.org/jira/browse/BEAM-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936139#comment-16936139 ] Jason Bowman commented on BEAM-8029: I'm seeing field mutation/corruption in the generic record results using Method.DIRECT_READ with the DataflowRunner and beam 2.15.0, and this runtime exception from the directrunner so it seems to be legit. > Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation > --- > > Key: BEAM-8029 > URL: https://issues.apache.org/jira/browse/BEAM-8029 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.14.0 >Reporter: Chris Larsen >Priority: Major > > > Code to read from BigQuery that is causing the issue: > {code:java} > pipeline > .apply(BigQueryIO > .read(SchemaAndRecord::getRecord) > .from(options.getTableRef()) > .withMethod(Method.DIRECT_READ) > .withCoder(AvroCoder.of(schema))) > {code} > If we remove .withMethod(Method.DIRECT_READ) then there is no issue. > > The error is: > {code:java} > org.apache.beam.sdk.util.IllegalMutationException: PTransform > BigQueryIO.TypedRead/Read(BigQueryStorageTableSource) mutated value > {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": > 52.0, "sample_time": 1564412307969368, "humidity": 74.3} after it was output > (new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, > "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}). > Values must not be mutated in any way after being output. > at > org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit > (ImmutabilityCheckingBundleFactory.java:134) > at org.apache.beam.runners.direct.EvaluationContext.commitBundles > (EvaluationContext.java:210) > at org.apache.beam.runners.direct.EvaluationContext.handleResult > (EvaluationContext.java:151) > at > org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult > (QuiescenceDriver.java:262) > at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle > (DirectTransformExecutor.java:189) > at org.apache.beam.runners.direct.DirectTransformExecutor.run > (DirectTransformExecutor.java:126) > at java.util.concurrent.Executors$RunnableAdapter.call > (Executors.java:511) > at java.util.concurrent.FutureTask.run (FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > at java.lang.Thread.run (Thread.java:748) > Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value > {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": > 52.0, "sample_time": 1564412307969368, "humidity": 74.3} mutated illegally, > new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, > "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}. > Encoding was > AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAArDVsP7jtMcFAjMzMzMzk1JA, > now > AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAAu6FuLDktMcFAs3MzMzMrFJA. > at > org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.illegalMutation > (MutationDetectors.java:153) > at > org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodifiedThrowingCheckedExceptions > (MutationDetectors.java:148) > at > org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodified > (MutationDetectors.java:123) > at > org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit > (ImmutabilityCheckingBundleFactory.java:124) > at org.apache.beam.runners.direct.EvaluationContext.commitBundles > (EvaluationContext.java:210) > at org.apache.beam.runners.direct.EvaluationContext.handleResult > (EvaluationContext.java:151) > at > org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult > (QuiescenceDriver.java:262) > at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle > (DirectTransformExecutor.java:189) > at org.apache.beam.runners.direct.DirectTransformExecutor.run > (DirectTransformExecutor.java:126) > at java.util.concurrent.Executors$RunnableAdapter.call > (Executors.java:511) > at java.util.concurrent.FutureTask.run (FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > at java.lang.Thread.run (Thread.java:748){code} > -- Thi
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316938 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 23/Sep/19 19:26 Start Date: 23/Sep/19 19:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9636: [BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9636#issuecomment-534247500 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316938) Time Spent: 50m (was: 40m) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 50m > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316939 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 23/Sep/19 19:26 Start Date: 23/Sep/19 19:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9636: [BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9636 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316939) Time Spent: 1h (was: 50m) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316940 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 23/Sep/19 19:26 Start Date: 23/Sep/19 19:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9637: [release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9637#issuecomment-534247693 LGTM. I've merged https://github.com/apache/beam/pull/9636 - I'll let the release manager merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316940) Time Spent: 1h 10m (was: 1h) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316937 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 23/Sep/19 19:25 Start Date: 23/Sep/19 19:25 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534247276 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316937) Time Spent: 0.5h (was: 20m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316936 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 19:23 Start Date: 23/Sep/19 19:23 Worklog Time Spent: 10m Work Description: y1chi commented on pull request #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#discussion_r327287701 ## File path: .test-infra/jenkins/job_PostCommit_Python_MongoDBIO_IT.groovy ## @@ -32,6 +32,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_MongoDBIO_IT', gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:python:test-suites:direct:py2:mongodbioIT') + tasks(':sdks:python:test-suites:direct:py37:mongodbioIT') Review comment: will change to that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316936) Time Spent: 2h (was: 1h 50m) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316935 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 23/Sep/19 19:20 Start Date: 23/Sep/19 19:20 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642#issuecomment-534245458 R: @lgajowy R: @kkucharc R: @echauchot R: @robertwb R: @udim Well, I tried my hand at this, but it's not showing the new jobs, so I'm not sure whether there's something I need to update to make the Jenkins config pull from this PR, or if we've got a chicken/egg situation wrt to testing if this new configuration works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316935) Time Spent: 20m (was: 10m) > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316933 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 23/Sep/19 19:18 Start Date: 23/Sep/19 19:18 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9637: [release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9637#issuecomment-534244691 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316933) Time Spent: 0.5h (was: 20m) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10
[ https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316934 ] ASF GitHub Bot logged work on BEAM-8299: Author: ASF GitHub Bot Created on: 23/Sep/19 19:18 Start Date: 23/Sep/19 19:18 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9637: [release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10 URL: https://github.com/apache/beam/pull/9637#issuecomment-534244737 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316934) Time Spent: 40m (was: 0.5h) > Upgrade Jackson to version 2.9.10 > - > > Key: BEAM-8299 > URL: https://issues.apache.org/jira/browse/BEAM-8299 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > [Jackson 2.9.10 addresses multiple CVE > issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from > previous Jackson versions, so we need to upgrade it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936132#comment-16936132 ] Ankur Goenka commented on BEAM-6923: Thats strange. I am using linux for testing this. Will try it on mac as well. > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7045) Element counters in the Web UI graph representations for transforms for Python streaming jobs in Google Cloud Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yichi Zhang updated BEAM-7045: -- Fix Version/s: 2.16.0 > Element counters in the Web UI graph representations for transforms for > Python streaming jobs in Google Cloud Dataflow > -- > > Key: BEAM-7045 > URL: https://issues.apache.org/jira/browse/BEAM-7045 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow, sdk-py-core > Environment: GCP Dataflow >Reporter: Fim >Priority: Major > Labels: features, usability > Fix For: 2.16.0 > > > Users don't see the element counters in transforms in the Web UI graph > representation when running a Python streaming job, which is expected > behavior according to [this Beam > page|https://beam.apache.org/documentation/sdks/python-streaming/#dataflowrunner-specific-features]. > The feature request is to enable the element counters in the Web UI graph > representations for transforms for Python streaming jobs in Google Cloud > Dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316932 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 19:14 Start Date: 23/Sep/19 19:14 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#discussion_r327283975 ## File path: .test-infra/jenkins/job_PostCommit_Python_MongoDBIO_IT.groovy ## @@ -32,6 +32,7 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_MongoDBIO_IT', gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(':sdks:python:test-suites:direct:py2:mongodbioIT') + tasks(':sdks:python:test-suites:direct:py37:mongodbioIT') Review comment: If we test only one minor version, I suggest python 3.5 as this is the lowest version beam supports. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316932) Time Spent: 1h 50m (was: 1h 40m) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7045) Element counters in the Web UI graph representations for transforms for Python streaming jobs in Google Cloud Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yichi Zhang resolved BEAM-7045. --- Resolution: Fixed > Element counters in the Web UI graph representations for transforms for > Python streaming jobs in Google Cloud Dataflow > -- > > Key: BEAM-7045 > URL: https://issues.apache.org/jira/browse/BEAM-7045 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow, sdk-py-core > Environment: GCP Dataflow >Reporter: Fim >Priority: Major > Labels: features, usability > Fix For: 2.16.0 > > > Users don't see the element counters in transforms in the Web UI graph > representation when running a Python streaming job, which is expected > behavior according to [this Beam > page|https://beam.apache.org/documentation/sdks/python-streaming/#dataflowrunner-specific-features]. > The feature request is to enable the element counters in the Web UI graph > representations for transforms for Python streaming jobs in Google Cloud > Dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8303) Filesystems not properly registered using FileIO.write()
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936130#comment-16936130 ] Preston Koprivica commented on BEAM-8303: - I'll defer to the experts on the priority of this issue. Currently, I am able to workaround it by setting FileIO.write().withIgnoreWindowing(), which is also the default for AvroIO ([https://github.com/apache/beam/blob/release-2.15.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L516]), and I suspect other FileBasedSink apis as well. > Filesystems not properly registered using FileIO.write() > > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Priority: Major > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316930 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 19:13 Start Date: 23/Sep/19 19:13 Worklog Time Spent: 10m Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#issuecomment-534242730 R: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316930) Time Spent: 1h 40m (was: 1.5h) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8303) Filesystems not properly registered using FileIO.write()
Preston Koprivica created BEAM-8303: --- Summary: Filesystems not properly registered using FileIO.write() Key: BEAM-8303 URL: https://issues.apache.org/jira/browse/BEAM-8303 Project: Beam Issue Type: Bug Components: sdk-java-core Affects Versions: 2.15.0 Reporter: Preston Koprivica I’m getting the following error when attempting to use the FileIO apis (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions with all the relevant AWS options, so the filesystem registry **should** be properly seeded by the time the graph is compiled and executed: {code:java} java.lang.IllegalArgumentException: No filesystem found for scheme s3 at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) at org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) at org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) at org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) {code} For reference, the write code resembles this: {code:java} FileIO.Write write = FileIO.write() .via(ParquetIO.sink(schema)) .to(options.getOutputDir()). // will be something like: s3:/// .withSuffix(".parquet"); records.apply(String.format("Write(%s)", options.getOutputDir()), write);{code} The issue does not appear to be related to ParquetIO.sink(). I am able to reliably reproduce the issue using JSON formatted records and TextIO.sink(), as well. Moreover, AvroIO is affected if withWindowedWrites() option is added. Just trying some different knobs, I went ahead and set the following option: {code:java} write = write.withNoSpilling();{code} This actually seemed to fix the issue, only to have it reemerge as I scaled up the data set size. The stack trace, while very similar, reads: {code:java} java.lang.IllegalArgumentException: No filesystem found for scheme s3 at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) at org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) at org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) at org.apache.flink.runtime.
[jira] [Created] (BEAM-8302) beam_PostCommit_XVR_Flink failing
Robert Bradshaw created BEAM-8302: - Summary: beam_PostCommit_XVR_Flink failing Key: BEAM-8302 URL: https://issues.apache.org/jira/browse/BEAM-8302 Project: Beam Issue Type: Bug Components: test-failures Reporter: Robert Bradshaw Assignee: Robert Bradshaw E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316922 ] ASF GitHub Bot logged work on BEAM-8213: Author: ASF GitHub Bot Created on: 23/Sep/19 18:57 Start Date: 23/Sep/19 18:57 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9642: [BEAM-8213] Split up monolithic python preCommit tests on jenkins URL: https://github.com/apache/beam/pull/9642 See the Jira issue for details: https://issues.apache.org/jira/browse/BEAM-8213# Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](h
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316918 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 18:46 Start Date: 23/Sep/19 18:46 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534232168 Run Apex ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316918) Time Spent: 50m (was: 40m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316917 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 23/Sep/19 18:46 Start Date: 23/Sep/19 18:46 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-534232127 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316917) Time Spent: 40m (was: 0.5h) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8272) GroupIntoBatches transform for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936099#comment-16936099 ] Robert Burke edited comment on BEAM-8272 at 9/23/19 6:29 PM: - Note that the implementation will necessarily be different in the Go SDK. The SDK doesn't yet support the State and Timers API, which both the Java and Python implementations use. Adding state and timers to the Go SDK is a larger task. Though, this looks like a largely streaming construct, which makes alternative implementations without State and Timers tricky, if not impossible. It also looks like it requires being able to emit "Iterables" which might be handle-able with slices instead, but otherwise the SDK doesn't yet support user side streams. was (Author: lostluck): Note that the implementation will necessarily be different in the Go SDK. The SDK doesn't yet support the State and Timers API, which both the Java and Python implementations use. > GroupIntoBatches transform for Go SDK > - > > Key: BEAM-8272 > URL: https://issues.apache.org/jira/browse/BEAM-8272 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: John Patoch >Priority: Major > > Add a PTransform that batches inputs to a desired batch size. Batches will > contain only elements of a single key. > It should offer the same API as its Java counterpart: > [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java] > > And Python counterpart: > https://github.com/apache/beam/blob/c445fdfdfab4a191aa780210564199f2873f85d8/sdks/python/apache_beam/transforms/util.py#L684 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8272) GroupIntoBatches transform for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936099#comment-16936099 ] Robert Burke commented on BEAM-8272: Note that the implementation will necessarily be different in the Go SDK. The SDK doesn't yet support the State and Timers API, which both the Java and Python implementations use. > GroupIntoBatches transform for Go SDK > - > > Key: BEAM-8272 > URL: https://issues.apache.org/jira/browse/BEAM-8272 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: John Patoch >Priority: Major > > Add a PTransform that batches inputs to a desired batch size. Batches will > contain only elements of a single key. > It should offer the same API as its Java counterpart: > [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java] > > And Python counterpart: > https://github.com/apache/beam/blob/c445fdfdfab4a191aa780210564199f2873f85d8/sdks/python/apache_beam/transforms/util.py#L684 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8096) Allow runner to configure "subnetwork"
[ https://issues.apache.org/jira/browse/BEAM-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-8096. Fix Version/s: Not applicable Resolution: Fixed > Allow runner to configure "subnetwork" > -- > > Key: BEAM-8096 > URL: https://issues.apache.org/jira/browse/BEAM-8096 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.15.0 >Reporter: Jack Whelpton >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > When running a Dataflow job, the network can be specified using the --network > flag; however, there is no support for doing the same for the subnetwork. > This would be the go equivalent of the following Java code: > [https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8242) Go: unregistered Go functions fail when using -buildmode=pie
[ https://issues.apache.org/jira/browse/BEAM-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-8242. Fix Version/s: Not applicable Resolution: Fixed > Go: unregistered Go functions fail when using -buildmode=pie > > > Key: BEAM-8242 > URL: https://issues.apache.org/jira/browse/BEAM-8242 > Project: Beam > Issue Type: Bug > Components: sdk-go >Affects Versions: 2.15.0 > Environment: GNU/Linux >Reporter: Ian Lance Taylor >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Original Estimate: 0h > Time Spent: 1.5h > Remaining Estimate: 0h > > If a Go program is built with -buildmode=pie, the code that transfers an > unregistered function fails. It looks up the symbol in the symbol table, but > that is not the location of the function at execution time. This causes a > program crash when calling the function. > I have a patch for this problem that I will send shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO
[ https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316892&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316892 ] ASF GitHub Bot logged work on BEAM-7919: Author: ASF GitHub Bot Created on: 23/Sep/19 18:17 Start Date: 23/Sep/19 18:17 Worklog Time Spent: 10m Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB IO integration test for py3.7 URL: https://github.com/apache/beam/pull/9639#issuecomment-534221012 Run Python MongoDBIO_IT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316892) Time Spent: 1.5h (was: 1h 20m) > Add a Python 3 test scenario for MongoDB IO > --- > > Key: BEAM-7919 > URL: https://issues.apache.org/jira/browse/BEAM-7919 > Project: Beam > Issue Type: Sub-task > Components: io-ideas >Reporter: Valentyn Tymofieiev >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Python 2 MongoDB IO suite was added in: > https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6 > We should also exercise this IO in Python 3. > cc: [~chamikara] [~altay] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8300) KinesisIO.write causes NPE as the producer is null
[ https://issues.apache.org/jira/browse/BEAM-8300?focusedWorklogId=316875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316875 ] ASF GitHub Bot logged work on BEAM-8300: Author: ASF GitHub Bot Created on: 23/Sep/19 17:56 Start Date: 23/Sep/19 17:56 Worklog Time Spent: 10m Work Description: jhalaria commented on issue #9640: [BEAM-8300]: KinesisIO.write throws NPE because producer is null URL: https://github.com/apache/beam/pull/9640#issuecomment-534212637 @iemejia - Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316875) Time Spent: 20m (was: 10m) > KinesisIO.write causes NPE as the producer is null > -- > > Key: BEAM-8300 > URL: https://issues.apache.org/jira/browse/BEAM-8300 > Project: Beam > Issue Type: Bug > Components: io-java-kinesis >Affects Versions: 2.15.0 >Reporter: Ankit Jhalaria >Assignee: Ankit Jhalaria >Priority: Minor > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > While using KinesisIO.write(), we encountered a NPE with the following stack > trace > {code:java} > org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:297)\n\tat > > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)\n\tat > > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)\n\tat > > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)\n\tat > > org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)\n\tat > > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)\n\tat > org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)\n\tat > java.lang.Thread.run(Thread.java:748)\nCaused by: > java.lang.NullPointerException: null\n\tat > org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.flushBundle(KinesisIO.java:685)\n\tat > > org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.finishBundle(KinesisIO.java:669){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8233) Separate loopback and docker modes on Flink runner guide
[ https://issues.apache.org/jira/browse/BEAM-8233?focusedWorklogId=316872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316872 ] ASF GitHub Bot logged work on BEAM-8233: Author: ASF GitHub Bot Created on: 23/Sep/19 17:50 Start Date: 23/Sep/19 17:50 Worklog Time Spent: 10m Work Description: tweise commented on issue #9605: [BEAM-8233] [BEAM-8214] [BEAM-8232] Document environment_type flag URL: https://github.com/apache/beam/pull/9605#issuecomment-534210168 And leaving some time is a good idea regardless who is tagged on the PR :) When I look for reviewers for my PRs I try to pick folks that I know are most knowledgeable with the area or have expressed interest in the topic. I have never come across a case that required to tag all committers so far :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316872) Time Spent: 2h 50m (was: 2h 40m) > Separate loopback and docker modes on Flink runner guide > > > Key: BEAM-8233 > URL: https://issues.apache.org/jira/browse/BEAM-8233 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Running loopback should be the "getting started" option, and docker mode > should be an "advanced" option with its own section of the Flink runner guide > with instructions and explanations (you need to build the docker container > images, you can't see your output in a local filesystem without > workarounds..) [https://beam.apache.org/documentation/runners/flink/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8300) KinesisIO.write causes NPE as the producer is null
[ https://issues.apache.org/jira/browse/BEAM-8300?focusedWorklogId=316871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316871 ] ASF GitHub Bot logged work on BEAM-8300: Author: ASF GitHub Bot Created on: 23/Sep/19 17:49 Start Date: 23/Sep/19 17:49 Worklog Time Spent: 10m Work Description: jhalaria commented on pull request #9640: [BEAM-8300]: KinesisIO.write throws NPE because producer is null URL: https://github.com/apache/beam/pull/9640 Added a readObject method to initialize the transient producer Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apach
[jira] [Created] (BEAM-8301) Argument inference breaks on incomparable types as defaults.
Robert Bradshaw created BEAM-8301: - Summary: Argument inference breaks on incomparable types as defaults. Key: BEAM-8301 URL: https://issues.apache.org/jira/browse/BEAM-8301 Project: Beam Issue Type: Bug Components: sdk-py-core Affects Versions: 2.16.0 Reporter: Robert Bradshaw Fix For: 2.16.0 A common culprit is numpy arrays, e.g. {code:python} class MyDoFn(beam.DoFn): def process(element, arg=np.ndarray(...)): ... {code} This bug was introduced as part of [BEAM-7060]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override
[ https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=316870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316870 ] ASF GitHub Bot logged work on BEAM-8240: Author: ASF GitHub Bot Created on: 23/Sep/19 17:46 Start Date: 23/Sep/19 17:46 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9629: [BEAM-8240] Sets workerHarnessContaienrImage in the default Environment of DataflowRunner URL: https://github.com/apache/beam/pull/9629#issuecomment-534208564 Thanks Luke. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316870) Time Spent: 4h 20m (was: 4h 10m) > Fix pipeline proto to contain worker_harness_container_image override > - > > Key: BEAM-8240 > URL: https://issues.apache.org/jira/browse/BEAM-8240 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > SDK harness incorrectly identifies itself when using custom SDK container > within environment field when building pipeline proto. > > Passing in the experiment *worker_harness_container_image=YYY* doesn't > override the pipeline proto environment field and it is still being populated > with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802* > > -- This message was sent by Atlassian Jira (v8.3.4#803005)