[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317173
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 05:03
Start Date: 24/Sep/19 05:03
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534389982
 
 
   > We're still doing the same amount of work, so IIUC, assuming we get 
similar CPU-utilization in this new configuration, these 5 jobs should finish 
in the time it took the previous single job to finish, plus whatever overhead 
is required per job to bootstrap the tests.
   
   I think the concern is that we may have higher test concurrency since we 
splitting the big python job into multiple pieces. The Jenkins job waiting 
queue cloud be longer. In addition, this may also lead higher stress on gcp 
resources usage, such as CreateJobPerMinutePerUser, and computing engine 
quotas. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317173)
Time Spent: 1h 40m  (was: 1.5h)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317151
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 04:15
Start Date: 24/Sep/19 04:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9647: [BEAM-6923] 
limit number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#discussion_r327420681
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -77,6 +78,13 @@
   public static final String MANIFEST = "MANIFEST";
   public static final String ARTIFACTS = "artifacts";
 
+  private final Semaphore permittedConcurrentWrite;
+
+  public BeamFileSystemArtifactStagingService() {
+super();
+permittedConcurrentWrite = new Semaphore(8);
 
 Review comment:
   Maybe make the `8` a constant? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317151)
Time Spent: 0.5h  (was: 20m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317150
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 04:04
Start Date: 24/Sep/19 04:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9647: [BEAM-6923] limit 
number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#issuecomment-534378966
 
 
   R: @robertwb @pabloem @lgajowy 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317150)
Time Spent: 20m  (was: 10m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317149
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 04:03
Start Date: 24/Sep/19 04:03
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9647: [BEAM-6923] 
limit number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/jo

[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317127
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:56
Start Date: 24/Sep/19 01:56
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534354241
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317127)
Time Spent: 12h 40m  (was: 12.5h)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936291#comment-16936291
 ] 

Ankur Goenka edited comment on BEAM-6923 at 9/24/19 1:40 AM:
-

Gcsutil.java use sets the default buffer size for individual file write to be 
64MB when the VM memory is more than 1GB.

Artifact staging tend to upload multiple files in parallel and each upload 
reserves 64MB causing this issue.

 

Couple of potential fixes are,
 # Limiting concurrent upload of files to a lower number.
 # Limit the gcs util buffer size per file.
 # Limit concurrent gcs connections so that it applies to all the file uploads.

 

1 applies only to artifact staging but theoretically this problem can impact a 
pipeline which writes to a bunch of files.

2 has a performance penalty when writing to a single file.

 3 applies to all the files but can lead to cases where we keep a file open for 
long time in pipeline processing

I am in favor of 1 as the impact will be limited to artifact staging.

 

cc: [~robertwb]


was (Author: angoenka):
Gcsutil.java use sets the default buffer size for individual file write to be 
64MB when the VM memory is more than 1GB.

Artifact staging tend to upload multiple files in parallel and each upload 
reserves 64MB causing this issue.

 

Couple of potential fixes are,
 # Limiting concurrent upload of files to a lower number.
 # Limit the gcs util buffer size per file.
 # Limit concurrent gcs connections so that it applies to all the file uploads.

 

1 applies only to artifact staging but theoretically this problem can impact a 
pipeline which writes to a bunch of files.

2 has a performance penalty when writing to a single file.

 

I am in favor of 3 as it should not have any performance penalty and applies to 
all the gcs related file IO.

 

cc: [~robertwb]

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936291#comment-16936291
 ] 

Ankur Goenka commented on BEAM-6923:


Gcsutil.java use sets the default buffer size for individual file write to be 
64MB when the VM memory is more than 1GB.

Artifact staging tend to upload multiple files in parallel and each upload 
reserves 64MB causing this issue.

 

Couple of potential fixes are,
 # Limiting concurrent upload of files to a lower number.
 # Limit the gcs util buffer size per file.
 # Limit concurrent gcs connections so that it applies to all the file uploads.

 

1 applies only to artifact staging but theoretically this problem can impact a 
pipeline which writes to a bunch of files.

2 has a performance penalty when writing to a single file.

 

I am in favor of 3 as it should not have any performance penalty and applies to 
all the gcs related file IO.

 

cc: [~robertwb]

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317117
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327393965
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
 
 Review comment:
   This sentence is unclear--
   
   Do you mean that if you add extra dependencies or serialization files, then 
you don't need to supply them again later? What does it mean that "the 
execution engine doesn't need them"? What is "them" in this sentence?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317117)
Time Spent: 1h 20m  (was: 1h 10m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317119
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327392732
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
 
 Review comment:
   (I realize this comment might look weird because it gets rendered, but see 
if you can view the source) I would use code fences instead of the pre 
... tags here. I don't see the white/grey background on the Beam 
site for other code boxes, so maybe change this to remain consistent. Then, you 
need to rewrite this list in markdown
   
   1. Navigate to your local copy of https://github.com/apache/beam";>beam
   1. Run Gradle with the `docker` target: 
   
   ```
   ./gradlew docker
   ```

   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317119)
Time Spent: 1.5h  (was: 1h 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317112
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327393049
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
 
 Review comment:
   Missing or extra word "Navigate to your local copy of [the]... "
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317112)
Time Spent: 1h  (was: 50m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317111
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327389718
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
 
 Review comment:
   "may"->"might"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317111)
Time Spent: 50m  (was: 40m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317115
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327393615
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
 
 Review comment:
   "It's often easier to write a new Dockerfile, but by modifying the original 
Dockerfile, you can customize anything (including the base OS) ."
   
   Just a suggestion--I don't really like parentheses, but this sentence is 
confusing with too many clauses. Another possibility is to break it up.
   
   "It's often easier to write a new Dockerfile. However, by modifying the 
original Dockerfile, you can customize anything, including the base OS."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317115)
Time Spent: 1h 10m  (was: 1h)

> Document c

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317113
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327390032
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
 
 Review comment:
   I suggest simplifying "an eponymous repository", maybe "a repository of the 
same name" ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317113)
Time Spent: 1h  (was: 50m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317116
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327391188
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
 
 Review comment:
   add periods at the

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317120
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327394115
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
 
 Review comment:
   add periods at the end of these sentences
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317120)
Time Spent: 1h 40m  (was: 1.5h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317109
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327388352
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
 
 Review comment:
   This sentence is confusing and might need to be split. Did you mean:
   
   "The Beam SDK runtime environment is isolated from other runtime systems 
because the SDK runtime environment is 
[containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/). This means that any execution engine can run 
the Beam SDK.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317109)
Time Spent: 0.5h  (was: 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317110
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327387167
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
 
 Review comment:
   I don't understand this redirect--is there more context/rationale? If 
someone had previously bookmarked the Beam Execution Model page, shouldn't they 
be redirected to /documentation/runtime/model/ page now? i.e. this redirect 
should be in the other file
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317110)
Time Spent: 40m  (was: 0.5h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317118
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327391740
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 
[Dockerfile](https://

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317114
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327390222
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
 
 Review comment:
   "like:"->"like the following:" or "such as the following:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317114)
Time Spent: 1h 10m  (was: 1h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317108
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:31
Start Date: 24/Sep/19 01:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327387472
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
 
 Review comment:
   "beacuse"->"because"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317108)
Time Spent: 0.5h  (was: 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317104
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:10
Start Date: 24/Sep/19 01:10
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534344709
 
 
   > Main concern: this change may increase precommit queue, since each job 
will require a jenkins slot, of which we have 16 VMs * 2 slots per VM. What 
required 1 slot will now require 5. 
   
   We're still doing the same amount of work, so IIUC, assuming we get similar 
CPU-utilization in this new configuration, these 5 jobs should finish in the 
time it took the previous single job to finish, plus whatever overhead is 
required per job to bootstrap the tests.  The previous job was taking 75 
minutes for me, so I'm hoping that the per-job overhead is relatively small in 
comparison (e.g. if bootstrap time is 1 minute per job, adding an extra 4 
minutes for 4 more jobs is  ~5% increase). 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317104)
Time Spent: 1.5h  (was: 1h 20m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8233) Separate loopback and docker modes on Flink runner guide

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8233?focusedWorklogId=317103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317103
 ]

ASF GitHub Bot logged work on BEAM-8233:


Author: ASF GitHub Bot
Created on: 24/Sep/19 01:08
Start Date: 24/Sep/19 01:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9605: [BEAM-8233] [BEAM-8214] 
[BEAM-8232] Document environment_type flag
URL: https://github.com/apache/beam/pull/9605#issuecomment-534344275
 
 
   @tweise I resolved the jiras. (I try to go through every once in a while and 
clean out the ones I forgot to close.) I'll be sure to tag you on future 
related PRs -- and yes, we can wait longer for review, especially for this 
variety of non-pressing documentation change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317103)
Time Spent: 3h  (was: 2h 50m)

> Separate loopback and docker modes on Flink runner guide
> 
>
> Key: BEAM-8233
> URL: https://issues.apache.org/jira/browse/BEAM-8233
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Running loopback should be the "getting started" option, and docker mode 
> should be an "advanced" option with its own section of the Flink runner guide 
> with instructions and explanations (you need to build the docker container 
> images, you can't see your output in a local filesystem without 
> workarounds..) [https://beam.apache.org/documentation/runners/flink/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7933) Adding timeout to JobServer grpc calls

2019-09-23 Thread Enrico Canzonieri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936278#comment-16936278
 ] 

Enrico Canzonieri commented on BEAM-7933:
-

Yes, I'm planning to work on this. It shouldn't take me too long to get a pr 
out. I should have some time by the end of this week.

> Adding timeout to JobServer grpc calls
> --
>
> Key: BEAM-7933
> URL: https://issues.apache.org/jira/browse/BEAM-7933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.14.0
>Reporter: Enrico Canzonieri
>Assignee: Enrico Canzonieri
>Priority: Minor
>  Labels: portability
>
> grpc calls to the JobServer from the Python SDK do not have timeouts. That 
> means that the call to pipeline.run()could hang forever if the JobServer is 
> not running (or failing to start).
> E.g. 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307]
>  the call to Prepare() doesn't provide any timeout value and the same applies 
> to other JobServer requests.
> As part of this ticket we could add a default timeout of 60 seconds as the 
> default timeout for http client.
> Additionally, we could consider adding a --job-server-request-timeout to the 
> [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805]
>  class to be used in the JobServer interactions inside probable_runner.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=317099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317099
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:49
Start Date: 24/Sep/19 00:49
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637#issuecomment-534340488
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317099)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8293) Document or log file system issues with docker

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8293?focusedWorklogId=317098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317098
 ]

ASF GitHub Bot logged work on BEAM-8293:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:49
Start Date: 24/Sep/19 00:49
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9646: [BEAM-8293] 
prescriptive log message for artifact retrieval failure
URL: https://github.com/apache/beam/pull/9646
 
 
   Also boosted the level for higher visibility.
   R: @robertwb 
   
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)

[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317096
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:45
Start Date: 24/Sep/19 00:45
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534339765
 
 
   /cc @youngoli on the last comment. Daniel was pointing out to long queue 
times due to the increase in number jobs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317096)
Time Spent: 1h 20m  (was: 1h 10m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317095
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:43
Start Date: 24/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split 
up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534339420
 
 
   Overall, this LGTM. We need to make sure that trigger phrases work, are 
visible, and by default all precommits run on python PRs . 
   
   Main concern: this change may increase precommit queue, since each job will 
require a jenkins slot, of which we have 16 VMs * 2 slots per VM. What required 
1 slot will now require 5. @yifanzou  what's your take on this? We may want to 
increase the amount of slots, and should monitor the precommit queue time after 
this is merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317095)
Time Spent: 1h 10m  (was: 1h)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317086
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:32
Start Date: 24/Sep/19 00:32
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split 
up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534337307
 
 
   To run a seed job on your PR you can say "Run Seed Jøb" (using ø to avoid 
triggering the job by this comment). 
   After seed job finishes (~10 min), run you can run jenkins jobs defined in 
this PR. 
   
   Note that seed job launched on the PR will affect other test executions, so 
consider giving a heads-up on dev. Seed job runs periodically, so at some point 
we will restore to the job specs using SOT in master.  
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317086)
Time Spent: 1h  (was: 50m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8214) Remove vestigial docker commands from portability instructions

2019-09-23 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8214.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Remove vestigial docker commands from portability instructions
> --
>
> Key: BEAM-8214
> URL: https://issues.apache.org/jira/browse/BEAM-8214
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>
> Right now [https://beam.apache.org/roadmap/portability/] contains docker 
> commands which are useless, as we are using loopback mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8232) Document LOOPBACK environment type

2019-09-23 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8232.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Document LOOPBACK environment type
> --
>
> Key: BEAM-8232
> URL: https://issues.apache.org/jira/browse/BEAM-8232
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>
> * Right now, LOOPBACK is not mentioned as a possible option for 
> environment_type [1]. It seems that it was intended for testing [2], but it's 
> useful for getting started on Beam and debugging as well, so it's worth a 
> mention.
>  * The meaning of each environment type should be documented somewhere, such 
> as on the runner pipeline options tables on the website.
>  
> [1] 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L818-L819]
>  [2] 
> [https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L82]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8233) Separate loopback and docker modes on Flink runner guide

2019-09-23 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8233.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Separate loopback and docker modes on Flink runner guide
> 
>
> Key: BEAM-8233
> URL: https://issues.apache.org/jira/browse/BEAM-8233
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Running loopback should be the "getting started" option, and docker mode 
> should be an "advanced" option with its own section of the Flink runner guide 
> with instructions and explanations (you need to build the docker container 
> images, you can't see your output in a local filesystem without 
> workarounds..) [https://beam.apache.org/documentation/runners/flink/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317079
 ]

ASF GitHub Bot logged work on BEAM-8131:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:24
Start Date: 24/Sep/19 00:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9482: [BEAM-8131] 
Provide Kubernetes setup for Prometheus
URL: https://github.com/apache/beam/pull/9482#discussion_r327381413
 
 

 ##
 File path: .test-infra/metrics/prometheus/prometheus/config/rules.yml
 ##
 @@ -0,0 +1,30 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+groups:
+- name: beamTests
+  rules:
+  - alert: TestRegression
+expr: 
((avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[1d])
+  - 
avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[6d] offset 
1d))
+  / 
avg_over_time({job="beam",instance="",__name__!="push_time_seconds"}[6d] offset 
1d))
+  > 0.2
+labels:
+  job: beamAlert
+annotations:
+  summary: 'Average runtime over 24 hours is 20% greater than average from 
six previous days'
 
 Review comment:
   Have you verified that this formula is appropriate? It may be good to 
formulate it in terms of the standard deviation? I'm not sure. Just thinking 
out loud.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317079)
Time Spent: 2h 50m  (was: 2h 40m)

> Provide Kubernetes setup with Prometheus
> 
>
> Key: BEAM-8131
> URL: https://issues.apache.org/jira/browse/BEAM-8131
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317081
 ]

ASF GitHub Bot logged work on BEAM-8131:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:24
Start Date: 24/Sep/19 00:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9482: [BEAM-8131] 
Provide Kubernetes setup for Prometheus
URL: https://github.com/apache/beam/pull/9482#discussion_r327380673
 
 

 ##
 File path: .test-infra/metrics/prometheus/alertmanager/config/alertmanager.yml
 ##
 @@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version   2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
 
 Review comment:
   Would you add comments with explanations for what each file does?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317081)
Time Spent: 2h 50m  (was: 2h 40m)

> Provide Kubernetes setup with Prometheus
> 
>
> Key: BEAM-8131
> URL: https://issues.apache.org/jira/browse/BEAM-8131
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=317080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317080
 ]

ASF GitHub Bot logged work on BEAM-8131:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:24
Start Date: 24/Sep/19 00:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9482: [BEAM-8131] 
Provide Kubernetes setup for Prometheus
URL: https://github.com/apache/beam/pull/9482#discussion_r327380834
 
 

 ##
 File path: .test-infra/metrics/docker-compose.yml
 ##
 @@ -86,9 +86,35 @@ services:
   - DB_DBNAME=beam_metrics
   - DB_DBUSERNAME=admin
   - DB_DBPWD=
+  prometheus:
 
 Review comment:
   I feel silly, but why do we need docker-compose configuration, and 
kubernetes configuration?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317080)
Time Spent: 2h 50m  (was: 2h 40m)

> Provide Kubernetes setup with Prometheus
> 
>
> Key: BEAM-8131
> URL: https://issues.apache.org/jira/browse/BEAM-8131
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317078
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:21
Start Date: 24/Sep/19 00:21
Worklog Time Spent: 10m 
  Work Description: rosetn commented on issue #9607: [BEAM-8209] Custom 
container docs
URL: https://github.com/apache/beam/pull/9607#issuecomment-534335284
 
 
   STAGED: 
http://apache-beam-website-pull-requests.storage.googleapis.com/9607/documentation/runtime/environments/index.html
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317078)
Time Spent: 20m  (was: 10m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317077
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:19
Start Date: 24/Sep/19 00:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534334710
 
 
   I believe you need to run a seed job to get the new jobs recognized by 
Jenkins. 
   
   R: @yifanzou could help with the specifics of seed job.
   R: @tvalentyn for reviewing wrt to python 3 jobs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317077)
Time Spent: 50m  (was: 40m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317076
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:19
Start Date: 24/Sep/19 00:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534334710
 
 
   I believe you need to run a seed job to get the new jobs recognized by 
Jenkins. 
   
   R: @yifanmai could help with the specifics of seed job.
   R: @tvalentyn for reviewing wrt to python 3 jobs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317076)
Time Spent: 40m  (was: 0.5h)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8256) Set fixed number of workers for File-based IOITs

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8256?focusedWorklogId=317061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317061
 ]

ASF GitHub Bot logged work on BEAM-8256:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:16
Start Date: 24/Sep/19 00:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9596: [BEAM-8256] 
Set fixed number of workers for Java IOITs
URL: https://github.com/apache/beam/pull/9596#discussion_r327381844
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
 ##
 @@ -28,7 +28,9 @@ def jobs = [
 pipelineOptions: [
 bigQueryDataset: 'beam_performance',
 bigQueryTable  : 'textioit_results',
-numberOfRecords: '100'
+numberOfRecords: '100',
+maxNumWorkers  : '5',
 
 Review comment:
   So to conclude: I agree with @lgajowy 's suggestion
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317061)
Time Spent: 1h  (was: 50m)

> Set fixed number of workers for File-based IOITs
> 
>
> Key: BEAM-8256
> URL: https://issues.apache.org/jira/browse/BEAM-8256
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Autoscaling is a feature of google cloud dataflow runner that adds/removes 
> worker nodes dynamically as the job runs. It can behave in a different way 
> creating different test (runtime) results in consequent runs. In integration 
> tests (such as IOIT but others also apply) we don't need such nondeterminism 
> and it's best to have a fixed number of workers for every test execution. 
> IOITs use autoscaling but they shouldn't. This issue was created to disable 
> it and set a fixed number of workers.
> Side note: autoscaling is already disabled in Nexmark and load tests of core 
> operations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8160) Add instructions about how to set FnApi multi-threads/processes

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8160?focusedWorklogId=317060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317060
 ]

ASF GitHub Bot logged work on BEAM-8160:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:15
Start Date: 24/Sep/19 00:15
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9628: [BEAM-8160] Add 
FnApi execution mode instruction
URL: https://github.com/apache/beam/pull/9628
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317060)
Time Spent: 40m  (was: 0.5h)

> Add instructions about how to set FnApi multi-threads/processes
> ---
>
> Key: BEAM-8160
> URL: https://issues.apache.org/jira/browse/BEAM-8160
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add instructions to Beam site or Beam wiki for easy discovery.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317057
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:09
Start Date: 24/Sep/19 00:09
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534332723
 
 
   Run Dataflow Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317057)
Time Spent: 12.5h  (was: 12h 20m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317055
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:09
Start Date: 24/Sep/19 00:09
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534332680
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317055)
Time Spent: 12h 20m  (was: 12h 10m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317054
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:09
Start Date: 24/Sep/19 00:09
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534332619
 
 
   Run Direct Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317054)
Time Spent: 12h 10m  (was: 12h)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317053
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 00:08
Start Date: 24/Sep/19 00:08
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534332583
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317053)
Time Spent: 12h  (was: 11h 50m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-23 Thread Derek He (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek He updated BEAM-8306:
---
Description: 
ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We 
expect it can be more accurate to split it base on query result size.

Currently, we have a big Elasticsearch index. But for query result, it only 
contains a few documents in the index.  ElasticsearchIO splits it into up 
to1024 BoundedSources in Google dataflow. It takes long time to finish the 
processing the small numbers of Elasticsearch document in Google dataflow.

 

 

  was:
ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We 
expect it can be more accurate to split it base on query result size.

Currently, we have a big Elasticsearch index. But for query result, it only 
contains a few documents in the index. But ElasticsearchIO splits it into up 
to1024 BoundedSources in Google dataflow. It takes long time to finish the 
processing the small numbers of Elasticsearch document in Google dataflow.

 

 


> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Priority: Major
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-23 Thread Derek He (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek He updated BEAM-8306:
---
Component/s: (was: sdk-java-core)
 io-java-elasticsearch

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Priority: Major
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index. But ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-23 Thread Derek He (Jira)
Derek He created BEAM-8306:
--

 Summary: improve estimation of data byte size reading from source 
in ElasticsearchIO
 Key: BEAM-8306
 URL: https://issues.apache.org/jira/browse/BEAM-8306
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Affects Versions: 2.14.0
Reporter: Derek He


ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We 
expect it can be more accurate to split it base on query result size.

Currently, we have a big Elasticsearch index. But for query result, it only 
contains a few documents in the index. But ElasticsearchIO splits it into up 
to1024 BoundedSources in Google dataflow. It takes long time to finish the 
processing the small numbers of Elasticsearch document in Google dataflow.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=317011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317011
 ]

ASF GitHub Bot logged work on BEAM-8240:


Author: ASF GitHub Bot
Created on: 23/Sep/19 22:24
Start Date: 23/Sep/19 22:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9629: 
[BEAM-8240] Sets workerHarnessContainerImage in the default Environment of 
DataflowRunner
URL: https://github.com/apache/beam/pull/9629
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317011)
Time Spent: 4h 40m  (was: 4.5h)

> Fix pipeline proto to contain worker_harness_container_image override
> -
>
> Key: BEAM-8240
> URL: https://issues.apache.org/jira/browse/BEAM-8240
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> SDK harness incorrectly identifies itself when using custom SDK container 
> within environment field when building pipeline proto.
>  
> Passing in the experiment *worker_harness_container_image=YYY* doesn't 
> override the pipeline proto environment field and it is still being populated 
> with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802*
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8305) Cleanup external transform tests

2019-09-23 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8305:
-

 Summary: Cleanup external transform tests
 Key: BEAM-8305
 URL: https://issues.apache.org/jira/browse/BEAM-8305
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Robert Bradshaw


Currently apache_beam/transforms/external_test.py has several entry points, 
sometimes called directly, sometimes via nosetest, sometimes with parameters 
passed via arguments or via environment variables, and the logic is not always 
clear to follow (either within the test, or via the several gradle targets that 
reference it). We should really let this file be a unit test, and create a 
different script (sharing a common library if needed) in our integration tests. 

This was the root cause of BEAM-8302 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316999
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 22:06
Start Date: 23/Sep/19 22:06
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix 
PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644#issuecomment-534304066
 
 
   R: @ihji
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316999)
Time Spent: 1h  (was: 50m)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316996
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 22:01
Start Date: 23/Sep/19 22:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix 
PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644#issuecomment-534302437
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316996)
Time Spent: 50m  (was: 40m)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316995
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 22:00
Start Date: 23/Sep/19 22:00
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix 
PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644#issuecomment-534302393
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316995)
Time Spent: 40m  (was: 0.5h)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316993
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:58
Start Date: 23/Sep/19 21:58
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix 
PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644#issuecomment-534301585
 
 
   Run PostCommit_XVR_Flink
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316993)
Time Spent: 0.5h  (was: 20m)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316991
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:57
Start Date: 23/Sep/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9644: [BEAM-8302] 
Fix PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://bui

[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=316992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316992
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:57
Start Date: 23/Sep/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9644: [BEAM-8302] Fix 
PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644#issuecomment-534301471
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316992)
Time Spent: 20m  (was: 10m)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936210#comment-16936210
 ] 

Ankur Goenka commented on BEAM-6923:


Also able to reproduce on linux my setting XMX for job server process to be 1GB

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936197#comment-16936197
 ] 

Ankur Goenka commented on BEAM-6923:


ACK:

I am able to reproduce it on MAC

Environment:

Java 1.8

Flink 1.5.6

 

goenka@goenka-macbookpro:~/d/work/beam/beam$ ./gradlew 
runners:flink:1.5:job-server:runShadow -PflinkMasterUrl=localhost:8081 
-PartifactsDir="gs://clouddfe-goenka/tmp/t0"
Configuration on demand is an incubating feature.

> Task :runners:flink:1.5:job-server:runShadow
Listening for transport dt_socket at address: 5005
[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - 
ArtifactStagingService started on localhost:8098
[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - 
Java ExpansionService started on localhost:8097
[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - 
JobService started on localhost:8099
Exception in thread "grpc-default-executor-70" java.lang.OutOfMemoryError: Java 
heap space
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
 at 
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Exception in thread "grpc-default-executor-146" java.lang.OutOfMemoryError: 
Java heap space
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
 at 
com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
 at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
 at 
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Exception in thread "grpc-default-executor-50" java.lang.OutOfMemoryError: Java 
heap space
Exception in thread "grpc-default-executor-23" java.lang.OutOfMemoryError: Java 
heap space

 

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUp

[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316981
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:18
Start Date: 23/Sep/19 21:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534289161
 
 
   Run Apex ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316981)
Time Spent: 1h 40m  (was: 1.5h)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316977
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:09
Start Date: 23/Sep/19 21:09
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534285609
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316977)
Time Spent: 1.5h  (was: 1h 20m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8111) SchemaCoder broken on DataflowRunner

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8111?focusedWorklogId=316978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316978
 ]

ASF GitHub Bot logged work on BEAM-8111:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:09
Start Date: 23/Sep/19 21:09
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9446: 
[BEAM-8111] Enable CloudObjectsTest$DefaultCoders
URL: https://github.com/apache/beam/pull/9446#discussion_r327330801
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java
 ##
 @@ -100,4 +99,47 @@ public boolean consistentWithEquals() {
   public String toString() {
 return "SchemaCoder: " + rowCoder.toString();
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+SchemaCoder that = (SchemaCoder) o;
+return rowCoder.equals(that.rowCoder)
+&& toRowFunction.equals(that.toRowFunction)
+&& fromRowFunction.equals(that.fromRowFunction);
 
 Review comment:
   I have a PR up now (https://github.com/apache/beam/pull/9493) that adds 
`equals` and `hashCode` to the `fromRow` and `toRow` functions created by all 
the `GetterBasedSchemaProvider` sub-classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316978)
Time Spent: 4h  (was: 3h 50m)

> SchemaCoder broken on DataflowRunner
> 
>
> Key: BEAM-8111
> URL: https://issues.apache.org/jira/browse/BEAM-8111
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/commit/e65c176a9f34e45d408281e1101a2ae54cef0f6c
>  broke SchemaCoder on Dataflow. When translating a schema that uses logical 
> types from a cloud object dataflow encounters a runtime error.
> This means any pipelines that use SqlTransform or schema transforms will fail 
> on Dataflow in 2.15.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316976
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:08
Start Date: 23/Sep/19 21:08
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534285319
 
 
   R: @reuvenlax
   
   I know you said Flink/Apex shouldn't be relying on coder equality and that 
should be the real fix for BEAM-8204 and BEAM-8205, but I think this is helpful 
for testing anyway.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316976)
Time Spent: 1h 20m  (was: 1h 10m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=316974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316974
 ]

ASF GitHub Bot logged work on BEAM-8240:


Author: ASF GitHub Bot
Created on: 23/Sep/19 21:02
Start Date: 23/Sep/19 21:02
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9629: [BEAM-8240] Sets 
workerHarnessContainerImage in the default Environment of DataflowRunner
URL: https://github.com/apache/beam/pull/9629#issuecomment-534283269
 
 
   Moved the test.
   
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316974)
Time Spent: 4.5h  (was: 4h 20m)

> Fix pipeline proto to contain worker_harness_container_image override
> -
>
> Key: BEAM-8240
> URL: https://issues.apache.org/jira/browse/BEAM-8240
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> SDK harness incorrectly identifies itself when using custom SDK container 
> within environment field when building pipeline proto.
>  
> Passing in the experiment *worker_harness_container_image=YYY* doesn't 
> override the pipeline proto environment field and it is still being populated 
> with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802*
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316972
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 20:50
Start Date: 23/Sep/19 20:50
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB 
IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#issuecomment-534279086
 
 
   Run Python MongoDBIO_IT
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316972)
Time Spent: 2h 20m  (was: 2h 10m)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8204) Newly added Java ValidatesRunner tests failed on ApexRunner

2019-09-23 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936167#comment-16936167
 ] 

Brian Hulette edited comment on BEAM-8204 at 9/23/19 8:43 PM:
--

I think https://github.com/apache/beam/pull/9493 will resolve the issue with 
the ValidatesRunner test on Flink and Apex, but not for the different side 
inputs. I made a separate bug for that: BEAM-8304


was (Author: bhulette):
I think https://github.com/apache/beam/pull/9493 will resolve the issue with 
the ValidatesRunner test on Flink and Apex, but not for the different side 
inputs. I'll make separate bug for that.

> Newly added Java ValidatesRunner tests failed on ApexRunner
> ---
>
> Key: BEAM-8204
> URL: https://issues.apache.org/jira/browse/BEAM-8204
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, test-failures
>Reporter: Yueyang Qiu
>Assignee: Brian Hulette
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Jenkins link:
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/testReport/]
>  
> Initial investigation:
> [https://github.com/apache/beam/pull/9454] and 
> [https://github.com/apache/beam/pull/9372] added new ValidatesRunner tests. 
> They have been tested on Dataflow runner, but are failing on Apex runner.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316965
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 20:39
Start Date: 23/Sep/19 20:39
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534274859
 
 
   Run Apex ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316965)
Time Spent: 1h  (was: 50m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316966
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 20:39
Start Date: 23/Sep/19 20:39
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534274910
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316966)
Time Spent: 1h 10m  (was: 1h)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316964
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 20:38
Start Date: 23/Sep/19 20:38
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB 
IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#issuecomment-534274330
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316964)
Time Spent: 2h 10m  (was: 2h)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8304) Apex runner dosn't support multiple side inputs with different coders

2019-09-23 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8304:
---

 Summary: Apex runner dosn't support multiple side inputs with 
different coders
 Key: BEAM-8304
 URL: https://issues.apache.org/jira/browse/BEAM-8304
 Project: Beam
  Issue Type: Bug
  Components: runner-apex
Affects Versions: 2.16.0
Reporter: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8204) Newly added Java ValidatesRunner tests failed on ApexRunner

2019-09-23 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936167#comment-16936167
 ] 

Brian Hulette commented on BEAM-8204:
-

I think https://github.com/apache/beam/pull/9493 will resolve the issue with 
the ValidatesRunner test on Flink and Apex, but not for the different side 
inputs. I'll make separate bug for that.

> Newly added Java ValidatesRunner tests failed on ApexRunner
> ---
>
> Key: BEAM-8204
> URL: https://issues.apache.org/jira/browse/BEAM-8204
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, test-failures
>Reporter: Yueyang Qiu
>Assignee: Brian Hulette
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Jenkins link:
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/testReport/]
>  
> Initial investigation:
> [https://github.com/apache/beam/pull/9454] and 
> [https://github.com/apache/beam/pull/9372] added new ValidatesRunner tests. 
> They have been tested on Dataflow runner, but are failing on Apex runner.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=316952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316952
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 23/Sep/19 20:07
Start Date: 23/Sep/19 20:07
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9641: [BEAM-8301] Fix 
incomparable defaults.
URL: https://github.com/apache/beam/pull/9641#issuecomment-534262692
 
 
   R: @markflyhigh
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316952)
Remaining Estimate: 0h
Time Spent: 10m

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7933) Adding timeout to JobServer grpc calls

2019-09-23 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936154#comment-16936154
 ] 

Kyle Weaver commented on BEAM-7933:
---

I think this would be a useful feature, especially for common failure modes 
such as pipeline submission. Do you still plan on implementing this [~enricoc]?

> Adding timeout to JobServer grpc calls
> --
>
> Key: BEAM-7933
> URL: https://issues.apache.org/jira/browse/BEAM-7933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.14.0
>Reporter: Enrico Canzonieri
>Assignee: Enrico Canzonieri
>Priority: Minor
>  Labels: portability
>
> grpc calls to the JobServer from the Python SDK do not have timeouts. That 
> means that the call to pipeline.run()could hang forever if the JobServer is 
> not running (or failing to start).
> E.g. 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307]
>  the call to Prepare() doesn't provide any timeout value and the same applies 
> to other JobServer requests.
> As part of this ticket we could add a default timeout of 60 seconds as the 
> default timeout for http client.
> Additionally, we could consider adding a --job-server-request-timeout to the 
> [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805]
>  class to be used in the JobServer interactions inside probable_runner.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8029) Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation

2019-09-23 Thread Jason Bowman (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936139#comment-16936139
 ] 

Jason Bowman commented on BEAM-8029:


I'm seeing field mutation/corruption in the generic record results using 
Method.DIRECT_READ with the DataflowRunner and beam 2.15.0, and this runtime 
exception from the directrunner so it seems to be legit.

> Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation 
> ---
>
> Key: BEAM-8029
> URL: https://issues.apache.org/jira/browse/BEAM-8029
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.14.0
>Reporter: Chris Larsen
>Priority: Major
>
>  
> Code to read from BigQuery that is causing the issue:
> {code:java}
> pipeline
>     .apply(BigQueryIO
>     .read(SchemaAndRecord::getRecord)
>     .from(options.getTableRef())
>     .withMethod(Method.DIRECT_READ)
>     .withCoder(AvroCoder.of(schema)))
> {code}
> If we remove .withMethod(Method.DIRECT_READ) then there is no issue.
>  
> The error is:
> {code:java}
> org.apache.beam.sdk.util.IllegalMutationException: PTransform 
> BigQueryIO.TypedRead/Read(BigQueryStorageTableSource) mutated value 
> {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": 
> 52.0, "sample_time": 1564412307969368, "humidity": 74.3} after it was output 
> (new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, 
> "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}). 
> Values must not be mutated in any way after being output.
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit
>  (ImmutabilityCheckingBundleFactory.java:134)
> at org.apache.beam.runners.direct.EvaluationContext.commitBundles 
> (EvaluationContext.java:210)
> at org.apache.beam.runners.direct.EvaluationContext.handleResult 
> (EvaluationContext.java:151)
> at 
> org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult
>  (QuiescenceDriver.java:262)
> at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle 
> (DirectTransformExecutor.java:189)
> at org.apache.beam.runners.direct.DirectTransformExecutor.run 
> (DirectTransformExecutor.java:126)
> at java.util.concurrent.Executors$RunnableAdapter.call 
> (Executors.java:511)
> at java.util.concurrent.FutureTask.run (FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value 
> {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": 
> 52.0, "sample_time": 1564412307969368, "humidity": 74.3} mutated illegally, 
> new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, 
> "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}. 
> Encoding was 
> AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAArDVsP7jtMcFAjMzMzMzk1JA, 
> now 
> AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAAu6FuLDktMcFAs3MzMzMrFJA.
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.illegalMutation
>  (MutationDetectors.java:153)
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodifiedThrowingCheckedExceptions
>  (MutationDetectors.java:148)
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodified
>  (MutationDetectors.java:123)
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit
>  (ImmutabilityCheckingBundleFactory.java:124)
> at org.apache.beam.runners.direct.EvaluationContext.commitBundles 
> (EvaluationContext.java:210)
> at org.apache.beam.runners.direct.EvaluationContext.handleResult 
> (EvaluationContext.java:151)
> at 
> org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult
>  (QuiescenceDriver.java:262)
> at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle 
> (DirectTransformExecutor.java:189)
> at org.apache.beam.runners.direct.DirectTransformExecutor.run 
> (DirectTransformExecutor.java:126)
> at java.util.concurrent.Executors$RunnableAdapter.call 
> (Executors.java:511)
> at java.util.concurrent.FutureTask.run (FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
> at java.lang.Thread.run (Thread.java:748){code}
>  



--
Thi

[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316938
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:26
Start Date: 23/Sep/19 19:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9636: [BEAM-8299] Upgrade 
Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9636#issuecomment-534247500
 
 
   LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316938)
Time Spent: 50m  (was: 40m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316939
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:26
Start Date: 23/Sep/19 19:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9636: [BEAM-8299] 
Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9636
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316939)
Time Spent: 1h  (was: 50m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316940
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:26
Start Date: 23/Sep/19 19:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637#issuecomment-534247693
 
 
   LGTM. I've merged https://github.com/apache/beam/pull/9636 -  I'll let the 
release manager merge this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316940)
Time Spent: 1h 10m  (was: 1h)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316937
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:25
Start Date: 23/Sep/19 19:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534247276
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316937)
Time Spent: 0.5h  (was: 20m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316936
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:23
Start Date: 23/Sep/19 19:23
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #9639: [BEAM-7919] Add 
MongoDB IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#discussion_r327287701
 
 

 ##
 File path: .test-infra/jenkins/job_PostCommit_Python_MongoDBIO_IT.groovy
 ##
 @@ -32,6 +32,7 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_MongoDBIO_IT',
 gradle {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
   tasks(':sdks:python:test-suites:direct:py2:mongodbioIT')
+  tasks(':sdks:python:test-suites:direct:py37:mongodbioIT')
 
 Review comment:
   will change to that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316936)
Time Spent: 2h  (was: 1h 50m)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316935
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:20
Start Date: 23/Sep/19 19:20
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534245458
 
 
   R: @lgajowy
   R: @kkucharc
   R: @echauchot
   R: @robertwb 
   R: @udim 
   
   Well, I tried my hand at this, but it's not showing the new jobs, so I'm not 
sure whether there's something I need to update to make the Jenkins config pull 
from this PR, or if we've got a chicken/egg situation wrt to testing if this 
new configuration works.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316935)
Time Spent: 20m  (was: 10m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316933
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:18
Start Date: 23/Sep/19 19:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637#issuecomment-534244691
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316933)
Time Spent: 0.5h  (was: 20m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=316934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316934
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:18
Start Date: 23/Sep/19 19:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637#issuecomment-534244737
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316934)
Time Spent: 40m  (was: 0.5h)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-23 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936132#comment-16936132
 ] 

Ankur Goenka commented on BEAM-6923:


Thats strange.

I am using linux for testing this.

Will try it on mac as well.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7045) Element counters in the Web UI graph representations for transforms for Python streaming jobs in Google Cloud Dataflow

2019-09-23 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang updated BEAM-7045:
--
Fix Version/s: 2.16.0

> Element counters in the Web UI graph representations for transforms for 
> Python streaming jobs in Google Cloud Dataflow
> --
>
> Key: BEAM-7045
> URL: https://issues.apache.org/jira/browse/BEAM-7045
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-py-core
> Environment: GCP Dataflow
>Reporter: Fim
>Priority: Major
>  Labels: features, usability
> Fix For: 2.16.0
>
>
> Users don't see the element counters in transforms in the Web UI graph 
> representation when running a Python streaming job, which is expected 
> behavior according to [this Beam 
> page|https://beam.apache.org/documentation/sdks/python-streaming/#dataflowrunner-specific-features].
> The feature request is to enable the element counters in the Web UI graph 
> representations for transforms for Python streaming jobs in Google Cloud 
> Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316932
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:14
Start Date: 23/Sep/19 19:14
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9639: [BEAM-7919] 
Add MongoDB IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#discussion_r327283975
 
 

 ##
 File path: .test-infra/jenkins/job_PostCommit_Python_MongoDBIO_IT.groovy
 ##
 @@ -32,6 +32,7 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_MongoDBIO_IT',
 gradle {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
   tasks(':sdks:python:test-suites:direct:py2:mongodbioIT')
+  tasks(':sdks:python:test-suites:direct:py37:mongodbioIT')
 
 Review comment:
   If we test only one minor version, I suggest python 3.5 as this is the 
lowest version beam supports.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316932)
Time Spent: 1h 50m  (was: 1h 40m)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7045) Element counters in the Web UI graph representations for transforms for Python streaming jobs in Google Cloud Dataflow

2019-09-23 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang resolved BEAM-7045.
---
Resolution: Fixed

> Element counters in the Web UI graph representations for transforms for 
> Python streaming jobs in Google Cloud Dataflow
> --
>
> Key: BEAM-7045
> URL: https://issues.apache.org/jira/browse/BEAM-7045
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-py-core
> Environment: GCP Dataflow
>Reporter: Fim
>Priority: Major
>  Labels: features, usability
> Fix For: 2.16.0
>
>
> Users don't see the element counters in transforms in the Web UI graph 
> representation when running a Python streaming job, which is expected 
> behavior according to [this Beam 
> page|https://beam.apache.org/documentation/sdks/python-streaming/#dataflowrunner-specific-features].
> The feature request is to enable the element counters in the Web UI graph 
> representations for transforms for Python streaming jobs in Google Cloud 
> Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-23 Thread Preston Koprivica (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936130#comment-16936130
 ] 

Preston Koprivica commented on BEAM-8303:
-

I'll defer to the experts on the priority of this issue.  Currently, I am able 
to workaround it by setting FileIO.write().withIgnoreWindowing(), which is also 
the default for AvroIO 
([https://github.com/apache/beam/blob/release-2.15.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L516]),
 and I suspect other FileBasedSink apis as well.

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Priority: Major
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.

[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316930
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 19:13
Start Date: 23/Sep/19 19:13
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB 
IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#issuecomment-534242730
 
 
   R: @tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316930)
Time Spent: 1h 40m  (was: 1.5h)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-23 Thread Preston Koprivica (Jira)
Preston Koprivica created BEAM-8303:
---

 Summary: Filesystems not properly registered using FileIO.write()
 Key: BEAM-8303
 URL: https://issues.apache.org/jira/browse/BEAM-8303
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.15.0
Reporter: Preston Koprivica


I’m getting the following error when attempting to use the FileIO apis 
(beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
with all the relevant AWS options, so the filesystem registry **should** be 
properly seeded by the time the graph is compiled and executed:
{code:java}
 java.lang.IllegalArgumentException: No filesystem found for scheme s3
    at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
    at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
    at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
    at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
    at org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
    at org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
    at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
    at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
    at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
    at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
    at 
org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
    at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
    at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)
 {code}
For reference, the write code resembles this:
{code:java}
 FileIO.Write write = FileIO.write()
    .via(ParquetIO.sink(schema))
    .to(options.getOutputDir()). // will be something like: 
s3:///
    .withSuffix(".parquet");

records.apply(String.format("Write(%s)", options.getOutputDir()), write);{code}
The issue does not appear to be related to ParquetIO.sink().  I am able to 
reliably reproduce the issue using JSON formatted records and TextIO.sink(), as 
well.  Moreover, AvroIO is affected if withWindowedWrites() option is added.

Just trying some different knobs, I went ahead and set the following option:
{code:java}
write = write.withNoSpilling();{code}
This actually seemed to fix the issue, only to have it reemerge as I scaled up 
the data set size.  The stack trace, while very similar, reads:
{code:java}
 java.lang.IllegalArgumentException: No filesystem found for scheme s3
    at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
    at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
    at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
    at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
    at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
    at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
    at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
    at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
    at 
org.apache.flink.runtime.

[jira] [Created] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-23 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8302:
-

 Summary: beam_PostCommit_XVR_Flink failing
 Key: BEAM-8302
 URL: https://issues.apache.org/jira/browse/BEAM-8302
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Robert Bradshaw
Assignee: Robert Bradshaw


E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=316922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316922
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 23/Sep/19 18:57
Start Date: 23/Sep/19 18:57
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9642: [BEAM-8213] 
Split up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642
 
 
   See the Jira issue for details: 
https://issues.apache.org/jira/browse/BEAM-8213#
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](h

[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316918
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 18:46
Start Date: 23/Sep/19 18:46
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146] Add 
equals and hashCode to SchemaCoder and RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534232168
 
 
   Run Apex ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316918)
Time Spent: 50m  (was: 40m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=316917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316917
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 23/Sep/19 18:46
Start Date: 23/Sep/19 18:46
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146] Add 
equals and hashCode to SchemaCoder and RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-534232127
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316917)
Time Spent: 40m  (was: 0.5h)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8272) GroupIntoBatches transform for Go SDK

2019-09-23 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936099#comment-16936099
 ] 

Robert Burke edited comment on BEAM-8272 at 9/23/19 6:29 PM:
-

Note that the implementation will necessarily be different in the Go SDK. The 
SDK doesn't yet support the State and Timers API, which both the  Java and 
Python implementations use. Adding state and timers to the Go SDK is a larger 
task.

Though, this looks like a largely streaming construct, which makes alternative 
implementations without State and Timers tricky, if not impossible. 

It also looks like it requires being able to emit  "Iterables" which might be 
handle-able with slices instead, but otherwise the SDK doesn't yet support user 
side streams.


was (Author: lostluck):
Note that the implementation will necessarily be different in the Go SDK. The 
SDK doesn't yet support the State and Timers API, which both the  Java and 
Python implementations use.

> GroupIntoBatches transform for Go SDK
> -
>
> Key: BEAM-8272
> URL: https://issues.apache.org/jira/browse/BEAM-8272
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: John Patoch
>Priority: Major
>
> Add a PTransform that batches inputs to a desired batch size. Batches will 
> contain only elements of a single key.
> It should offer the same API as its Java counterpart:
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java]
>  
> And Python counterpart:
> https://github.com/apache/beam/blob/c445fdfdfab4a191aa780210564199f2873f85d8/sdks/python/apache_beam/transforms/util.py#L684



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8272) GroupIntoBatches transform for Go SDK

2019-09-23 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936099#comment-16936099
 ] 

Robert Burke commented on BEAM-8272:


Note that the implementation will necessarily be different in the Go SDK. The 
SDK doesn't yet support the State and Timers API, which both the  Java and 
Python implementations use.

> GroupIntoBatches transform for Go SDK
> -
>
> Key: BEAM-8272
> URL: https://issues.apache.org/jira/browse/BEAM-8272
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: John Patoch
>Priority: Major
>
> Add a PTransform that batches inputs to a desired batch size. Batches will 
> contain only elements of a single key.
> It should offer the same API as its Java counterpart:
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java]
>  
> And Python counterpart:
> https://github.com/apache/beam/blob/c445fdfdfab4a191aa780210564199f2873f85d8/sdks/python/apache_beam/transforms/util.py#L684



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8096) Allow runner to configure "subnetwork"

2019-09-23 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-8096.

Fix Version/s: Not applicable
   Resolution: Fixed

> Allow runner to configure "subnetwork"
> --
>
> Key: BEAM-8096
> URL: https://issues.apache.org/jira/browse/BEAM-8096
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: 2.15.0
>Reporter: Jack Whelpton
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When running a Dataflow job, the network can be specified using the --network 
> flag; however, there is no support for doing the same for the subnetwork. 
> This would be the go equivalent of the following Java code:
> [https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8242) Go: unregistered Go functions fail when using -buildmode=pie

2019-09-23 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-8242.

Fix Version/s: Not applicable
   Resolution: Fixed

> Go: unregistered Go functions fail when using -buildmode=pie
> 
>
> Key: BEAM-8242
> URL: https://issues.apache.org/jira/browse/BEAM-8242
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: 2.15.0
> Environment: GNU/Linux
>Reporter: Ian Lance Taylor
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>   Original Estimate: 0h
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If a Go program is built with -buildmode=pie, the code that transfers an 
> unregistered function fails.  It looks up the symbol in the symbol table, but 
> that is not the location of the function at execution time.  This causes a 
> program crash when calling the function.
> I have a patch for this problem that I will send shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7919) Add a Python 3 test scenario for MongoDB IO

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7919?focusedWorklogId=316892&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316892
 ]

ASF GitHub Bot logged work on BEAM-7919:


Author: ASF GitHub Bot
Created on: 23/Sep/19 18:17
Start Date: 23/Sep/19 18:17
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #9639: [BEAM-7919] Add MongoDB 
IO integration test for py3.7
URL: https://github.com/apache/beam/pull/9639#issuecomment-534221012
 
 
   Run Python MongoDBIO_IT
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316892)
Time Spent: 1.5h  (was: 1h 20m)

> Add a Python 3 test scenario for MongoDB IO
> ---
>
> Key: BEAM-7919
> URL: https://issues.apache.org/jira/browse/BEAM-7919
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas
>Reporter: Valentyn Tymofieiev
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Python 2 MongoDB IO suite was added in:
> https://github.com/apache/beam/commit/17bf89d6070565b715f44ecb5f6394219b94cfe6
> We should also exercise this IO in Python 3. 
> cc: [~chamikara] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8300) KinesisIO.write causes NPE as the producer is null

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8300?focusedWorklogId=316875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316875
 ]

ASF GitHub Bot logged work on BEAM-8300:


Author: ASF GitHub Bot
Created on: 23/Sep/19 17:56
Start Date: 23/Sep/19 17:56
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on issue #9640: [BEAM-8300]: 
KinesisIO.write throws NPE because producer is null
URL: https://github.com/apache/beam/pull/9640#issuecomment-534212637
 
 
   @iemejia - Please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316875)
Time Spent: 20m  (was: 10m)

> KinesisIO.write causes NPE as the producer is null
> --
>
> Key: BEAM-8300
> URL: https://issues.apache.org/jira/browse/BEAM-8300
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.15.0
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While using KinesisIO.write(), we encountered a NPE with the following stack 
> trace 
> {code:java}
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:297)\n\tat
>  
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)\n\tat
>  
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)\n\tat
>  org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)\n\tat 
> java.lang.Thread.run(Thread.java:748)\nCaused by: 
> java.lang.NullPointerException: null\n\tat 
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.flushBundle(KinesisIO.java:685)\n\tat
>  
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.finishBundle(KinesisIO.java:669){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8233) Separate loopback and docker modes on Flink runner guide

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8233?focusedWorklogId=316872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316872
 ]

ASF GitHub Bot logged work on BEAM-8233:


Author: ASF GitHub Bot
Created on: 23/Sep/19 17:50
Start Date: 23/Sep/19 17:50
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9605: [BEAM-8233] 
[BEAM-8214] [BEAM-8232] Document environment_type flag
URL: https://github.com/apache/beam/pull/9605#issuecomment-534210168
 
 
   And leaving some time is a good idea regardless who is tagged on the PR :)
   
   When I look for reviewers for my PRs I try to pick folks that I know are 
most knowledgeable with the area or have expressed interest in the topic. I 
have never come across a case that required to tag all committers so far :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316872)
Time Spent: 2h 50m  (was: 2h 40m)

> Separate loopback and docker modes on Flink runner guide
> 
>
> Key: BEAM-8233
> URL: https://issues.apache.org/jira/browse/BEAM-8233
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Running loopback should be the "getting started" option, and docker mode 
> should be an "advanced" option with its own section of the Flink runner guide 
> with instructions and explanations (you need to build the docker container 
> images, you can't see your output in a local filesystem without 
> workarounds..) [https://beam.apache.org/documentation/runners/flink/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8300) KinesisIO.write causes NPE as the producer is null

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8300?focusedWorklogId=316871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316871
 ]

ASF GitHub Bot logged work on BEAM-8300:


Author: ASF GitHub Bot
Created on: 23/Sep/19 17:49
Start Date: 23/Sep/19 17:49
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on pull request #9640: [BEAM-8300]: 
KinesisIO.write throws NPE because producer is null
URL: https://github.com/apache/beam/pull/9640
 
 
   Added a readObject method to initialize the transient producer
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apach

[jira] [Created] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-23 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8301:
-

 Summary: Argument inference breaks on incomparable types as 
defaults.
 Key: BEAM-8301
 URL: https://issues.apache.org/jira/browse/BEAM-8301
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.16.0
Reporter: Robert Bradshaw
 Fix For: 2.16.0


A common culprit is numpy arrays, e.g.

{code:python}
class MyDoFn(beam.DoFn):
  def process(element, arg=np.ndarray(...)):
... 

{code}

This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8240) Fix pipeline proto to contain worker_harness_container_image override

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8240?focusedWorklogId=316870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316870
 ]

ASF GitHub Bot logged work on BEAM-8240:


Author: ASF GitHub Bot
Created on: 23/Sep/19 17:46
Start Date: 23/Sep/19 17:46
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9629: [BEAM-8240] Sets 
workerHarnessContaienrImage in the default Environment of DataflowRunner
URL: https://github.com/apache/beam/pull/9629#issuecomment-534208564
 
 
   Thanks Luke. PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316870)
Time Spent: 4h 20m  (was: 4h 10m)

> Fix pipeline proto to contain worker_harness_container_image override
> -
>
> Key: BEAM-8240
> URL: https://issues.apache.org/jira/browse/BEAM-8240
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> SDK harness incorrectly identifies itself when using custom SDK container 
> within environment field when building pipeline proto.
>  
> Passing in the experiment *worker_harness_container_image=YYY* doesn't 
> override the pipeline proto environment field and it is still being populated 
> with *gcr.io/cloud-dataflow/v1beta3/python-fnapi:beam-master-20190802*
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >