[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152841
 ]

ASF GitHub Bot logged work on BEAM-5630:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:23
Start Date: 09/Oct/18 18:23
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #6559: 
[BEAM-5630] add more tests into BigQueryIOReadIT
URL: https://github.com/apache/beam/pull/6559#discussion_r223812009
 
 

 ##
 File path: sdks/java/io/google-cloud-platform/build.gradle
 ##
 @@ -97,6 +97,7 @@ task integrationTest(type: Test) {
   outputs.upToDateWhen { false }
 
   include '**/*IT.class'
+  exclude '**/BigQueryIOReadIT.class'
 
 Review comment:
   To be more specific, this test will be skip when jobs running against the 
DirectRunner since limited JVM heap. The entire class will still run against 
the DataflowRunner in the job  
[:beam-runners-google-cloud-dataflow-java:googleCloudPlatformIntegrationTest](https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/build.gradle#L152).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152841)
Time Spent: 3h 20m  (was: 3h 10m)

> supplement Bigquery Read IT test cases and blacklist them in post-commit
> 
>
> Key: BEAM-5630
> URL: https://issues.apache.org/jira/browse/BEAM-5630
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-10-09 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643903#comment-16643903
 ] 

Tim Robertson commented on BEAM-5036:
-

{quote}Should this be marked as a blocker for 2.8.0 ? PR is still in review.
{quote}
 
 I don't think this can block 2.8.0 for time constraints. The [PR (now 
closed)|https://github.com/apache/beam/pull/6289] caused a lot of discussion 
but was not suitable for merging. I suggest we modify {{HDFSFileSystem}} to 
always overwrite (i.e. move the necessary bits from the [existing 
PR|https://github.com/apache/beam/pull/6289] into the HDFS implementation only) 
and then the solution will be simpler and the change can be made to 
{{rename()}}.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152840
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:19
Start Date: 09/Oct/18 18:19
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #6532: [BEAM-5467] Use 
process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#issuecomment-428296935
 
 
   I was hopping to fit it via process environment factory. Does it need a 
separate environment factory to be registered?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152840)
Time Spent: 6h 20m  (was: 6h 10m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152839
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:17
Start Date: 09/Oct/18 18:17
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6532: [BEAM-5467] Use 
process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#issuecomment-428296179
 
 
   > Reusing it would be better. Can we get that code to master?
   
   Of course. We just need to agree how we want to enable the environment 
provider/factory. Probably best as separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152839)
Time Spent: 6h 10m  (was: 6h)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5456) Update google-api-client libraries to 1.25

2018-10-09 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath updated BEAM-5456:
-
Fix Version/s: (was: 2.8.0)
   2.9.0

> Update google-api-client libraries to 1.25
> --
>
> Key: BEAM-5456
> URL: https://issues.apache.org/jira/browse/BEAM-5456
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Blocker
> Fix For: 2.9.0
>
>
> This version updates authentication URLs 
> ([https://github.com/googleapis/google-api-java-client/releases)] that is 
> needed for certain features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5456) Update google-api-client libraries to 1.25

2018-10-09 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643879#comment-16643879
 ] 

Chamikara Jayalath commented on BEAM-5456:
--

Moved to 2.9.0.

> Update google-api-client libraries to 1.25
> --
>
> Key: BEAM-5456
> URL: https://issues.apache.org/jira/browse/BEAM-5456
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Blocker
> Fix For: 2.9.0
>
>
> This version updates authentication URLs 
> ([https://github.com/googleapis/google-api-java-client/releases)] that is 
> needed for certain features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5427) Fix sample code (AverageFn) in Combine.java

2018-10-09 Thread Ruoyun Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoyun Huang resolved BEAM-5427.

   Resolution: Fixed
Fix Version/s: 2.8.0

> Fix sample code (AverageFn) in Combine.java
> ---
>
> Key: BEAM-5427
> URL: https://issues.apache.org/jira/browse/BEAM-5427
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-java
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: 2.8.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Sample code missing coder. 
> In its current state, job run fails with Coder missing error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152835
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:11
Start Date: 09/Oct/18 18:11
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223807159
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
 
 Review comment:
   Makes sense


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152835)
Time Spent: 5h 40m  (was: 5.5h)

> Python Flink ValidatesRunner job fixes
> --
>
>     Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152834
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:11
Start Date: 09/Oct/18 18:11
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223806288
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Flink.groovy
 ##
 @@ -33,8 +33,8 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_VR_Flink',
   steps {
 gradle {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatch')
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixStreaming')
+  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatchProcess')
 
 Review comment:
   Makes sense. And will also move the sequence to gradle task


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152834)
Time Spent: 5.5h  (was: 5h 20m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152836
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:11
Start Date: 09/Oct/18 18:11
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223806349
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py
 ##
 @@ -34,12 +36,26 @@
   # Run as
   #
   # python -m apache_beam.runners.portability.flink_runner_test \
-  # /path/to/job_server.jar \
+  # --flink_job_server_jar=/path/to/job_server.jar \
+  # --type=Batch \
+  # --harness_type=docker \
   # [FlinkRunnerTest.test_method, ...]
-  flinkJobServerJar = sys.argv.pop(1)
-  streaming = sys.argv.pop(1).lower() == 'streaming'
 
-  # This is defined here to only be run when we invoke this file explicitly.
+  parser = argparse.ArgumentParser(add_help=True)
+  parser.add_argument('--flink_job_server_jar',
+  help='Job server jar to submit jobs.')
+  parser.add_argument('--type', default='batch',
+  help='Job type. batch or streaming')
 
 Review comment:
   sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152836)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152838
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:11
Start Date: 09/Oct/18 18:11
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223807072
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
 
 Review comment:
   I will try it out


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152838)
Time Spent: 6h  (was: 5h 50m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152837
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:11
Start Date: 09/Oct/18 18:11
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223806539
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
 
 Review comment:
   Will make it a boolean


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152837)
Time Spent: 5h 50m  (was: 5h 40m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=152833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152833
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:09
Start Date: 09/Oct/18 18:09
Worklog Time Spent: 10m 
  Work Description: timrobertson100 closed pull request #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
index 92b2382e365..66a94ae087b 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
@@ -757,9 +757,14 @@ final void moveToOutputFiles(
 "Will copy temporary file {} to final location {}", 
entry.getKey(), entry.getValue());
   }
   // During a failure case, files may have been deleted in an earlier 
step. Thus
-  // we ignore missing files here.
-  FileSystems.copy(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
-  removeTemporaryFiles(srcFiles);
+  // we ignore missing files here. It is possible that files already exist 
in the
+  // destination and we wish to replace them (e.g. a previous job run)
+  FileSystems.rename(
+  srcFiles,
+  dstFiles,
+  StandardMoveOptions.IGNORE_MISSING_FILES,
+  StandardMoveOptions.REPLACE_EXISTING);
+  removeTemporaryFiles(srcFiles); // removes temp folder if applicable
 }
 
 /**
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
index c2977d0527c..eb22a76f33e 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
@@ -36,6 +36,7 @@
 import java.io.IOException;
 import java.nio.channels.ReadableByteChannel;
 import java.nio.channels.WritableByteChannel;
+import java.nio.file.FileAlreadyExistsException;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
@@ -298,6 +299,12 @@ public static void copy(
*
* It doesn't support renaming globs.
*
+   * If the underlying file system reports that a target file already 
exists and moveOptions
+   * contains {@code StandardMoveOptions.REPLACE_EXISTING} then all target 
files that existed prior
+   * to calling rename will be deleted and the rename retried. When a retry is 
attempted then
+   * missing files from the source will be ignored. Some filesystem 
implementations will always
+   * overwrite.
+   *
* @param srcResourceIds the references of the source resources
* @param destResourceIds the references of the destination resources
*/
@@ -310,10 +317,11 @@ public static void rename(
   return;
 }
 
+Set options = Sets.newHashSet(moveOptions);
+
 List srcToRename = srcResourceIds;
 List destToRename = destResourceIds;
-if (Sets.newHashSet(moveOptions)
-.contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) {
+if 
(options.contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) {
   KV, List> existings =
   filterMissingFiles(srcResourceIds, destResourceIds);
   srcToRename = existings.getKey();
@@ -322,8 +330,71 @@ public static void rename(
 if (srcToRename.isEmpty()) {
   return;
 }
-getFileSystemInternal(srcToRename.iterator().next().getScheme())
-.rename(srcToRename, destToRename);
+
+boolean replaceExisting =
+options.contains(MoveOptions.StandardMoveOptions.REPLACE_EXISTING) ? 
true : false;
+rename(
+getFileSystemInternal(srcToRename.iterator().next().getScheme()),
+srcToRename,
+destToRename,
+replaceExisting);
+  }
+
+  /**
+   * Executes a rename of the src which all must exist using the provided 
filesystem.
+   *
+   * If replaceExisting is enabled and filesystem throws {code 
FileAlreadyExistsException} then
+   * an attempt to delete the destination is made and the rename is retried. 
Some filesystem
+   * implementations may apply this automatically without throwing.
+   *
+   * @param fileSystem The filesystem in use
+   * @param srcResourceIds The source resources to move
+   * @param destResourceIds The destinations for the sources to move to (must 
be same length as
+   * srcResourceIds)

[jira] [Updated] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-10-09 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated BEAM-5036:

Fix Version/s: (was: 2.8.0)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152831
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:07
Start Date: 09/Oct/18 18:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6590: [BEAM-5315] 
Partially port io
URL: https://github.com/apache/beam/pull/6590#issuecomment-428292725
 
 
   Looks like precommit failed due to a flaky test. I'll ask Jenkins to retest 
and try to reproduce the failure locally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152831)
Time Spent: 3.5h  (was: 3h 20m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152832
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:07
Start Date: 09/Oct/18 18:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6590: [BEAM-5315] 
Partially port io
URL: https://github.com/apache/beam/pull/6590#issuecomment-428292778
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152832)
Time Spent: 3h 40m  (was: 3.5h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152830
 ]

ASF GitHub Bot logged work on BEAM-5630:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:07
Start Date: 09/Oct/18 18:07
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #6559: 
[BEAM-5630] add more tests into BigQueryIOReadIT
URL: https://github.com/apache/beam/pull/6559#discussion_r223806946
 
 

 ##
 File path: sdks/java/io/google-cloud-platform/build.gradle
 ##
 @@ -97,6 +97,7 @@ task integrationTest(type: Test) {
   outputs.upToDateWhen { false }
 
   include '**/*IT.class'
+  exclude '**/BigQueryIOReadIT.class'
 
 Review comment:
   https://discuss.gradle.org/t/how-to-exclude-a-single-test-method/20860


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152830)
Time Spent: 3h 10m  (was: 3h)

> supplement Bigquery Read IT test cases and blacklist them in post-commit
> 
>
> Key: BEAM-5630
> URL: https://issues.apache.org/jira/browse/BEAM-5630
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-10-09 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643834#comment-16643834
 ] 

Chamikara Jayalath edited comment on BEAM-5514 at 10/9/18 6:06 PM:
---

I'm trying to determine the priority at which this should be addressed.

[~reuvenlax] any reason why we rely on workitems retries instead of retrying BQ 
streaming write requests with exponential backoff ?


was (Author: chamikara):
I'm trying to determine the priority at which this should be addressed.

 

[~reuvenlax] any reason why do rely on workitems retries instead of retrying BQ 
streaming write requests with exponential backoff ?

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Reuven Lax
>Priority: Major
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152829
 ]

ASF GitHub Bot logged work on BEAM-5630:


Author: ASF GitHub Bot
Created on: 09/Oct/18 18:06
Start Date: 09/Oct/18 18:06
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #6559: 
[BEAM-5630] add more tests into BigQueryIOReadIT
URL: https://github.com/apache/beam/pull/6559#discussion_r223806777
 
 

 ##
 File path: sdks/java/io/google-cloud-platform/build.gradle
 ##
 @@ -97,6 +97,7 @@ task integrationTest(type: Test) {
   outputs.upToDateWhen { false }
 
   include '**/*IT.class'
+  exclude '**/BigQueryIOReadIT.class'
 
 Review comment:
   Unfortunately no. The 'exclude' here only works on file pattern so that we 
could only exclude the entire class. I found an incubating Gradle 
[TestFilter](https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/testing/TestFilter.html)
 that is able to execute specific test methods. But it doesn't work with the 
'exclude' to skip test methods.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152829)
Time Spent: 3h  (was: 2h 50m)

> supplement Bigquery Read IT test cases and blacklist them in post-commit
> 
>
> Key: BEAM-5630
> URL: https://issues.apache.org/jira/browse/BEAM-5630
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-10-09 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-5514:


Assignee: Reuven Lax  (was: Chamikara Jayalath)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Reuven Lax
>Priority: Major
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-10-09 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643834#comment-16643834
 ] 

Chamikara Jayalath commented on BEAM-5514:
--

I'm trying to determine the priority at which this should be addressed.

 

[~reuvenlax] any reason why do rely on workitems retries instead of retrying BQ 
streaming write requests with exponential backoff ?

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Chamikara Jayalath
>Priority: Major
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5294) [beam_Release_Gradle_NightlySnapshot] Failing due to website test.

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5294.

   Resolution: Fixed
Fix Version/s: Not applicable

> [beam_Release_Gradle_NightlySnapshot] Failing due to website test.
> --
>
> Key: BEAM-5294
> URL: https://issues.apache.org/jira/browse/BEAM-5294
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Alan Myrvold
>Priority: Major
> Fix For: Not applicable
>
>
> Build link: 
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/185/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5522) beam_PostRelease_NightlySnapshot timed out

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5522.

   Resolution: Cannot Reproduce
Fix Version/s: Not applicable

> beam_PostRelease_NightlySnapshot timed out
> --
>
> Key: BEAM-5522
> URL: https://issues.apache.org/jira/browse/BEAM-5522
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Ahmet Altay
>Assignee: Alan Myrvold
>Priority: Major
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/383/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5407) [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post commit

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643833#comment-16643833
 ] 

Scott Wegner commented on BEAM-5407:


Is this resolved? Can this be closed?

> [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post 
> commit
> 
>
> Key: BEAM-5407
> URL: https://issues.apache.org/jira/browse/BEAM-5407
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Batkhuyag Batsaikhan
>Assignee: Pablo Estrada
>Priority: Major
>
> Failing job url: 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1482/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5534) AutocompleteIT and WikiTopSessionsIT fail in google testing

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner closed BEAM-5534.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> AutocompleteIT and WikiTopSessionsIT fail in google testing
> ---
>
> Key: BEAM-5534
> URL: https://issues.apache.org/jira/browse/BEAM-5534
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5125) beam_PostCommit_Java_GradleBuild org.apache.beam.runners.flink PortableExecutionTest testExecution_1_ flaky

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner closed BEAM-5125.
--
   Resolution: Cannot Reproduce
Fix Version/s: Not applicable

> beam_PostCommit_Java_GradleBuild org.apache.beam.runners.flink 
> PortableExecutionTest testExecution_1_ flaky
> ---
>
> Key: BEAM-5125
> URL: https://issues.apache.org/jira/browse/BEAM-5125
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Test fails in both: post and precommit tests. Fails more often in pre-commits.
> Pre-commit history: 
> [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/180/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/history/]
> Post-commit history: 
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1223/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/history/?start=75]
> Sample job:
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/180/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/
> Log:
> java.lang.AssertionError: job state expected: but was: at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:118) at 
> org.apache.beam.runners.flink.PortableExecutionTest.testExecution(PortableExecutionTest.java:177)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5367) [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website test failure

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5367.

Resolution: Fixed

> [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website 
> test failure
> ---
>
> Key: BEAM-5367
> URL: https://issues.apache.org/jira/browse/BEAM-5367
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Batkhuyag Batsaikhan
>Assignee: Batkhuyag Batsaikhan
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> apache/website test is breaking nightly build. Since we don't have stable 
> apache/beam website, we should remove it from the build for now. Until we 
> have an actual plan to fully migrate from asf beam-site, we should disable 
> apache/website checks from Beam build.
> Failing job url: [https://scans.gradle.com/s/uxb6mdigqj4n4/console-log#L23465]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5367) [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website test failure

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643830#comment-16643830
 ] 

Scott Wegner commented on BEAM-5367:


This is now fixed, latest runs have been successful: 
https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/395/

> [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website 
> test failure
> ---
>
> Key: BEAM-5367
> URL: https://issues.apache.org/jira/browse/BEAM-5367
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Batkhuyag Batsaikhan
>Assignee: Batkhuyag Batsaikhan
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> apache/website test is breaking nightly build. Since we don't have stable 
> apache/beam website, we should remove it from the build for now. Until we 
> have an actual plan to fully migrate from asf beam-site, we should disable 
> apache/website checks from Beam build.
> Failing job url: [https://scans.gradle.com/s/uxb6mdigqj4n4/console-log#L23465]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5243) beam_Release_Gradle_NightlySnapshot InvocationError py27-cython/bin/python setup.py nosetests

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5243:
---
Labels:   (was: currently-failing)

> beam_Release_Gradle_NightlySnapshot InvocationError py27-cython/bin/python 
> setup.py nosetests
> -
>
> Key: BEAM-5243
> URL: https://issues.apache.org/jira/browse/BEAM-5243
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Andrew Pilloud
>Assignee: Ahmet Altay
>Priority: Major
>
> It isn't clear to me what exactly failed, logs are full of stack traces.
>  [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/151/]
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify/5844/]
>  
>  *01:00:38* ERROR: InvocationError for command 
> '/home/jenkins/jenkins-slave/workspace/beam_Release_Gradle_NightlySnapshot/src/sdks/python/target/.tox/py27-cython/bin/python
>  setup.py nosetests' (exited with code -11)*01:00:38* 
> ___ summary 
> *01:00:38* ERROR: py27-cython: commands 
> failed*01:00:38*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5665) [beam_PreCommit_Website_Commit] [:testWebsite] External link http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 means something's wrong.

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643826#comment-16643826
 ] 

Scott Wegner commented on BEAM-5665:


http://www.atrato.io/ still 404's. If this doesn't resolve by next week we 
should remove the dead link.

> [beam_PreCommit_Website_Commit] [:testWebsite]  External link 
> http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 
> means something's wrong.
> 
>
> Key: BEAM-5665
> URL: https://issues.apache.org/jira/browse/BEAM-5665
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/249/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/2sqxa7hlvdeum/console-log?task=:beam-website:testWebsite#L12]
>  * [Test source 
> code|https://github.com/apache/beam-site/blob/asf-site/Rakefile]
> Initial investigation:
> It seems http://www.atrato.io/ is down or no longer available. We should 
> disable the test for now and perhaps remove the dead link if it no longer 
> works.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5661) [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such file or directory

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5661:
---
Labels:   (was: currently-failing)

> [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such 
> file or directory
> ---
>
> Key: BEAM-5661
> URL: https://issues.apache.org/jira/browse/BEAM-5661
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Henning Rohde
>Priority: Major
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins Job|https://builds.apache.org/job/beam_PostCommit_Py_ValCont/902/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/pmjbkhxaeryx4/console-log?task=:beam-sdks-python-container:docker#L2]
>  * [Test source 
> code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/sdks/python/container/build.gradle#L61]
> Initial investigation:
> {{failed to get digest 
> sha256:4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: open 
> /var/lib/docker/image/aufs/imagedb/content/sha256/4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed:
>  no such file or directory}}
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5665) [beam_PreCommit_Website_Commit] [:testWebsite] External link http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 means something's wrong.

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5665:
---
Labels:   (was: currently-failing)

> [beam_PreCommit_Website_Commit] [:testWebsite]  External link 
> http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 
> means something's wrong.
> 
>
> Key: BEAM-5665
> URL: https://issues.apache.org/jira/browse/BEAM-5665
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/249/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/2sqxa7hlvdeum/console-log?task=:beam-website:testWebsite#L12]
>  * [Test source 
> code|https://github.com/apache/beam-site/blob/asf-site/Rakefile]
> Initial investigation:
> It seems http://www.atrato.io/ is down or no longer available. We should 
> disable the test for now and perhaps remove the dead link if it no longer 
> works.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5662) [beam_PostCommit_Website_Publish] [:testWebsite] External link http://wiki.apache.org/incubator/BeamProposal failed: got a time out

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5662:
---
Labels:   (was: currently-failing)

> [beam_PostCommit_Website_Publish] [:testWebsite] External link 
> http://wiki.apache.org/incubator/BeamProposal failed: got a time out
> ---
>
> Key: BEAM-5662
> URL: https://issues.apache.org/jira/browse/BEAM-5662
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/94/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/h4mayefon7v7q/console-log?task=:beam-website:testWebsite#L12]
>  * [Test source 
> code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/website/Rakefile#L6]
> Initial investigation:
> The failed link is http://wiki.apache.org/incubator/BeamProposal 
> When I visit this link, it works for me. This is likely a flake.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5662) [beam_PostCommit_Website_Publish] [:testWebsite] External link http://wiki.apache.org/incubator/BeamProposal failed: got a time out

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643825#comment-16643825
 ] 

Scott Wegner commented on BEAM-5662:


I haven't seen this in a while, but I'll keep open for a bit just in case.

> [beam_PostCommit_Website_Publish] [:testWebsite] External link 
> http://wiki.apache.org/incubator/BeamProposal failed: got a time out
> ---
>
> Key: BEAM-5662
> URL: https://issues.apache.org/jira/browse/BEAM-5662
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/94/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/h4mayefon7v7q/console-log?task=:beam-website:testWebsite#L12]
>  * [Test source 
> code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/website/Rakefile#L6]
> Initial investigation:
> The failed link is http://wiki.apache.org/incubator/BeamProposal 
> When I visit this link, it works for me. This is likely a flake.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5680) [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication failed'

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643824#comment-16643824
 ] 

Scott Wegner commented on BEAM-5680:


Latest run succeeded, this appears to be fixed: 
https://builds.apache.org/job/beam_PostCommit_Website_Publish/146/

> [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication 
> failed'
> 
>
> Key: BEAM-5680
> URL: https://issues.apache.org/jira/browse/BEAM-5680
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/131/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/wkk4k3jp3l5ve/console-log?task=:beam-website:publishWebsite]
>  * [Test source 
> code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L221]
> Initial investigation:
> Failing consistently since 10/7 5am until now ([test 
> history|https://builds.apache.org/job/beam_PostCommit_Website_Publish/buildTimeTrend]).
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5680) [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication failed'

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5680.

   Resolution: Fixed
Fix Version/s: Not applicable

> [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication 
> failed'
> 
>
> Key: BEAM-5680
> URL: https://issues.apache.org/jira/browse/BEAM-5680
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/131/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/wkk4k3jp3l5ve/console-log?task=:beam-website:publishWebsite]
>  * [Test source 
> code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L221]
> Initial investigation:
> Failing consistently since 10/7 5am until now ([test 
> history|https://builds.apache.org/job/beam_PostCommit_Website_Publish/buildTimeTrend]).
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5634) Bring Dataflow Java Worker Code into Beam

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5634?focusedWorklogId=152828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152828
 ]

ASF GitHub Bot logged work on BEAM-5634:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:56
Start Date: 09/Oct/18 17:56
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6561: 
[BEAM-5634] Bring dataflow java worker code into beam
URL: https://github.com/apache/beam/pull/6561#discussion_r223803403
 
 

 ##
 File path: runners/google-cloud-dataflow-java/worker/build.gradle
 ##
 @@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**/
+// Apply BeamModulePlugin
+
+// Reuse project_root/buildSrc in this build.gradle file to reduce the
+// maintenance burden and simpily this file. See BeamModulePlugin for
+// documentation on default build tasks and properties that are enabled in
+// addition to natures that will be applied to worker.
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+group = "org.apache.beam.runners.dataflow"
+
+/**/
+// Apply Java nature with customized configurations
+
+// Set a specific version of 'com.google.apis:google-api-services-dataflow'
+// by adding -Pdataflow.version= in Gradle command. Otherwise,
+// 'google_clients_version' defined in BeamModulePlugin will be used as 
default.
+def DATAFLOW_VERSION = "dataflow.version"
+
+// To build FnAPI or legacy worker.
+// Use -PisLegacyWorker in Gradle command if build legacy worker, otherwise,
+// FnAPI worker is considered as default.
+def is_legacy_worker = {
+  return project.hasProperty("isLegacyWorker")
+}
+
+// Get full dependency of 'com.google.apis:google-api-services-dataflow'
+def google_api_services_dataflow = project.hasProperty(DATAFLOW_VERSION) ? 
"com.google.apis:google-api-services-dataflow:" + getProperty(DATAFLOW_VERSION) 
: library.java.google_api_services_dataflow
+
+// Returns a string representing the relocated path to be used with the shadow
+// plugin when given a suffix such as "com.".
+def getWorkerRelocatedPath = { String suffix ->
+  return ("org.apache.beam.runners.dataflow.worker.repackaged."
+  + suffix)
+}
+
+// Following listed dependencies will be shaded only in fnapi worker, not 
legacy
+// worker
+def sdk_provided_dependencies = [
+  "org.apache.beam:beam-runners-google-cloud-dataflow-java:$version",
+  "org.apache.beam:beam-sdks-java-core:$version",
+  
"org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:$version",
+  "org.apache.beam:beam-sdks-java-io-google-cloud-platform:$version",
+  google_api_services_dataflow,
+  library.java.avro,
+  library.java.google_api_client,
+  library.java.google_http_client,
+  library.java.google_http_client_jackson,
+  library.java.jackson_annotations,
+  library.java.jackson_core,
+  library.java.jackson_databind,
+  library.java.joda_time,
+]
+
+// Exclude unneeded dependencies when building jar
+def excluded_dependencies = [
+  "com.google.auto.service:auto-service",  // Provided scope added from 
applyJavaNature
+  "com.google.auto.value:auto-value",  // Provided scope added from 
applyJavaNature
+  "org.codehaus.jackson:jackson-core-asl", // Exclude an old version of 
jackson-core-asl introduced by google-http-client-jackson
+  "org.objenesis:objenesis",   // Transitive dependency 
introduced from Beam
+  "org.tukaani:xz",// Transitive dependency 
introduced from Beam
+  library.java.commons_compress,   // Transitive dependency 
introduced from Beam
+  library.java.error_prone_annotations,// Provided scope added in 
worker
+  library.java.hamcrest_core,  // Test only
+  library.java.hamcrest_library,   // Test 

[jira] [Commented] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643817#comment-16643817
 ] 

Scott Wegner commented on BEAM-5688:


[~udim] fyi

> [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on 
> githubPullRequestId assert
> --
>
> Key: BEAM-5688
> URL: https://issues.apache.org/jira/browse/BEAM-5688
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/]
>  * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge]
>  * [Test source 
> code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234]
> Initial investigation:
> This is a problem with how the website gradle scripts are implemented to 
> accept an githubPullRequestId. The Cron job will not have an associated PR, 
> so this currently fails.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5691) testWebsite failed owing to several url 404 not found

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5691.

   Resolution: Duplicate
Fix Version/s: Not applicable

> testWebsite failed owing to several url 404 not found
> -
>
> Key: BEAM-5691
> URL: https://issues.apache.org/jira/browse/BEAM-5691
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Priority: Major
> Fix For: Not applicable
>
>
> test log: 
> https://builds.apache.org/job/beam_PreCommit_Website_Commit/262/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5683:
---
Summary: [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] 
Fails due to pip download flake  (was: [beam_PostCommit_Py_VR_Dataflow] 
[test_multiple_empty_outputs] Failure summary)

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to 
> pip download flake
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_requirements_cache
> processes.check_call(cmd_args)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py",
>  line 46, in check_call
> return subprocess.check_call(*args, **kwargs)
>   File "/usr/lib/python2.7/subprocess.py&q

[jira] [Assigned] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5637:
---

Assignee: Ruoyun Huang

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5114) Create example uber jars for supported runners

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5114?focusedWorklogId=152825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152825
 ]

ASF GitHub Bot logged work on BEAM-5114:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:50
Start Date: 09/Oct/18 17:50
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6191: [BEAM-5114] Create 
example uber jars
URL: https://github.com/apache/beam/pull/6191#issuecomment-428286959
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152825)
Time Spent: 1h 50m  (was: 1h 40m)

> Create example uber jars for supported runners
> --
>
> Key: BEAM-5114
> URL: https://issues.apache.org/jira/browse/BEAM-5114
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-java
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Producing these artifacts results in several benefits
>  * Gives an example of how to package user code for different runners
>  * Enables ad-hoc testing of runner changes against real user pipelines easier
>  * Enables integration testing end-to-end pipelines against different runner 
> services



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152823
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:42
Start Date: 09/Oct/18 17:42
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6532: [BEAM-5467] Use 
process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#issuecomment-428284446
 
 
   @angoenka the current solution of using the container boot binary prevents 
this from running on another architecture. Can we come up with a solution that 
would also work locally on macOS and other platforms. We had already solved 
this for Lyft.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152823)
Time Spent: 5h 20m  (was: 5h 10m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152822
 ]

ASF GitHub Bot logged work on BEAM-5660:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:40
Start Date: 09/Oct/18 17:40
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6613: [BEAM-5660] Add both 
dataflow legacy worker and fn-api worker into JavaPreCommit
URL: https://github.com/apache/beam/pull/6613#issuecomment-428283701
 
 
   Re @herohde, I don't think they are related. All failures are url 404 not 
found. Filed JIRA: https://issues.apache.org/jira/browse/BEAM-5691


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152822)
Time Spent: 0.5h  (was: 20m)

> Add dataflow java worker unit tests into precommit
> --
>
> Key: BEAM-5660
> URL: https://issues.apache.org/jira/browse/BEAM-5660
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5691) testWebsite failed owing to several url 404 not found

2018-10-09 Thread Boyuan Zhang (JIRA)
Boyuan Zhang created BEAM-5691:
--

 Summary: testWebsite failed owing to several url 404 not found
 Key: BEAM-5691
 URL: https://issues.apache.org/jira/browse/BEAM-5691
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Boyuan Zhang


test log: 
https://builds.apache.org/job/beam_PreCommit_Website_Commit/262/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5684) Need a test that verifies Flattening / not-flattening of BQ nested records

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5684?focusedWorklogId=152821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152821
 ]

ASF GitHub Bot logged work on BEAM-5684:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:38
Start Date: 09/Oct/18 17:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6609: [BEAM-5684] Adding a 
BQNestedRecords Test
URL: https://github.com/apache/beam/pull/6609#issuecomment-428283083
 
 
   Run Java PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152821)
Time Spent: 1h 20m  (was: 1h 10m)

> Need a test that verifies Flattening / not-flattening of BQ nested records
> --
>
> Key: BEAM-5684
> URL: https://issues.apache.org/jira/browse/BEAM-5684
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643769#comment-16643769
 ] 

Xu Mingmin commented on BEAM-5690:
--

Is this the error specifically? Seems duplicated {{0}} counts here,
{code:java}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00 0}{code}

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152815
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:19
Start Date: 09/Oct/18 17:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6616: 
[BEAM-5624] Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616#discussion_r223790260
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio_test.py
 ##
 @@ -25,10 +25,15 @@
 from builtins import range
 
 import avro.datafile
-import avro.schema
 from avro.datafile import DataFileWriter
 from avro.io import DatumWriter
 import hamcrest as hc
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  from avro.schema import Parse
 
 Review comment:
   +1 to @aaltay's comment. Also we will need a similar change in other places 
in Beam where we use avro.schema.parse. See `apache_beam/io/avroio.py`, 
`apache_beam/examples/avro_bitcoin.py`. This can be done in another PR if you 
prefer.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152815)
Time Spent: 0.5h  (was: 20m)

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152813
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:12
Start Date: 09/Oct/18 17:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6616: 
[BEAM-5624] Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616#discussion_r223788054
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio_test.py
 ##
 @@ -25,10 +25,15 @@
 from builtins import range
 
 import avro.datafile
-import avro.schema
 from avro.datafile import DataFileWriter
 from avro.io import DatumWriter
 import hamcrest as hc
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  from avro.schema import Parse
 
 Review comment:
   Could you add a comment here about, in what versions of avro which version 
of parse is supported. (We can use this information to remove this block later 
on.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152813)
Time Spent: 20m  (was: 10m)

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152814
 ]

ASF GitHub Bot logged work on BEAM-5254:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:12
Start Date: 09/Oct/18 17:12
Worklog Time Spent: 10m 
  Work Description: xinyuiscool commented on issue #6292: [BEAM-5254] Add 
Samza Runner translator registrar and refactor config
URL: https://github.com/apache/beam/pull/6292#issuecomment-428274726
 
 
   Thanks for merging it! I might have a few more coming in the next few days 
for Samza Runner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152814)
Time Spent: 1.5h  (was: 1h 20m)

> Add Samza Runner translator registrar and refactor config generation
> 
>
> Key: BEAM-5254
> URL: https://issues.apache.org/jira/browse/BEAM-5254
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-samza
>Reporter: Xinyu Liu
>Assignee: Xinyu Liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add a registrar for transform translators in Samza Runner so we allow 
> customized translators. Also refactors the config generation part so it can 
> be extended outside open source beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1081) annotations should support custom messages and classes

2018-10-09 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643748#comment-16643748
 ] 

Ahmet Altay commented on BEAM-1081:
---

Adding tests would be good. There is also the "1. ability to customize message" 
part of the original issue.

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643746#comment-16643746
 ] 

Valentyn Tymofieiev commented on BEAM-5624:
---

This issue is due to a small API change (see: 
https://github.com/apache/beam/pull/6616). 
However there are some troubling reports about bad experience with 
avro-python3, see [1,2].

That said, we may want to migrate to fastavro sooner than later, FYI [~udim] 
 [~chamikara] [~altay].

[1] https://github.com/common-workflow-language/cwltool/issues/524
[2] https://issues.apache.org/jira/browse/AVRO-2046 

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152779
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:46
Start Date: 09/Oct/18 16:46
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223779339
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152777
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:42
Start Date: 09/Oct/18 16:42
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223777983
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152771
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:36
Start Date: 09/Oct/18 16:36
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223776184
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   If it's used only once then we could you use just character, but ok with a 
constant


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152771)
Time Spent: 3h  (was: 2h 50m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152762
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:28
Start Date: 09/Oct/18 16:28
Worklog Time Spent: 10m 
  Work Description: splovyt commented on issue #6590: [BEAM-5315] Partially 
port io
URL: https://github.com/apache/beam/pull/6590#issuecomment-428259931
 
 
   @tvalentyn I have rebased, although the checks seem to be hanging. PTAL and 
please merge if approved (I am at an event next two days). Thanks once again 
for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152762)
Time Spent: 3h 20m  (was: 3h 10m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152760
 ]

ASF GitHub Bot logged work on BEAM-5660:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:27
Start Date: 09/Oct/18 16:27
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6613: [BEAM-5660] Add both 
dataflow legacy worker and fn-api worker into JavaPreCommit
URL: https://github.com/apache/beam/pull/6613#issuecomment-428259602
 
 
   @boyuanzz Is the website failure unrelated?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152760)
Time Spent: 20m  (was: 10m)

> Add dataflow java worker unit tests into precommit
> --
>
> Key: BEAM-5660
> URL: https://issues.apache.org/jira/browse/BEAM-5660
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152755
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:20
Start Date: 09/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223770717
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   https://issues.apache.org/jira/browse/FLINK-2491


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152755)
Time Spent: 0.5h  (was: 20m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.Abstrac

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152754
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:19
Start Date: 09/Oct/18 16:19
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223770313
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   Is there a Flink JIRA for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152754)
Time Spent: 20m  (was: 10m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithR

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152759
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:25
Start Date: 09/Oct/18 16:25
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223772311
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   It's also in `ImpulseSourceFunction` but doesn't hurt to add it here as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152759)
Time Spent: 50m  (was: 40m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152758
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:23
Start Date: 09/Oct/18 16:23
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223771625
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   great, let's add that link as a comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152758)
Time Spent: 40m  (was: 0.5h)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.Abstrac

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152749
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:15
Start Date: 09/Oct/18 16:15
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6617: [BEAM-5687] Fix 
checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617
 
 
   ###  [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines
   
   This provides the input WindowValue Coder to ExecutableStageDoFnOperator 
which
   ensures that the buffered elements can be checkpointed correctly.
   
   ### [BEAM-3727] Do not shutdown Impulse sources to enable checkpointing
   
   Flink's checkpointing won't work properly after sources have finished. They 
need
   to be up and running for as long as checkpoints should be taken. This was
   already the case for the non-portable UnboundedSourceWrapper but it needs to 
be
   extended also for Impulse transforms. 
   
   ### [BEAM-3727] Allow sources to shutdown when checkpointing is disabled 
   
   
   CC @tweise @angoenka 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152749)
Time Spent: 10m
Remaining Estimate: 0h

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug

[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643682#comment-16643682
 ] 

Kenneth Knowles commented on BEAM-5690:
---

CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia]

Since it is not reproduced in the Flink runner or Direct runner, the SQL 
implementation of GROUP BY is probably triggering some latent bug in the Spark 
runner's streaming mode.

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5690:
-

Assignee: (was: Amit Sela)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152737
 ]

ASF GitHub Bot logged work on BEAM-5254:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:00
Start Date: 09/Oct/18 16:00
Worklog Time Spent: 10m 
  Work Description: akedin closed pull request #6292: [BEAM-5254] Add Samza 
Runner translator registrar and refactor config
URL: https://github.com/apache/beam/pull/6292
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
index 6e67e385756..bba10ddd962 100644
--- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
+++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
@@ -33,7 +33,6 @@
 import org.apache.beam.sdk.values.PValue;
 import org.apache.samza.application.StreamApplication;
 import org.apache.samza.config.Config;
-import org.apache.samza.config.MapConfig;
 import org.apache.samza.metrics.MetricsRegistryMap;
 import org.apache.samza.operators.ContextManager;
 import org.apache.samza.operators.StreamGraph;
@@ -76,20 +75,18 @@ public SamzaPipelineResult run(Pipeline pipeline) {
 
 // Add a dummy source for use in special cases (TestStream, empty flatten)
 final PValue dummySource = pipeline.apply("Dummy Input Source", 
Create.of("dummy"));
-
 final Map idMap = PViewToIdMapper.buildIdMap(pipeline);
-final Map config = ConfigBuilder.buildConfig(pipeline, 
options, idMap);
 
-final SamzaExecutionContext executionContext = new SamzaExecutionContext();
+final ConfigBuilder configBuilder = new ConfigBuilder(options);
+SamzaPipelineTranslator.createConfig(pipeline, idMap, configBuilder);
+final ApplicationRunner runner = 
ApplicationRunner.fromConfig(configBuilder.build());
 
-final ApplicationRunner runner = ApplicationRunner.fromConfig(new 
MapConfig(config));
+final SamzaExecutionContext executionContext = new SamzaExecutionContext();
 
 final StreamApplication app =
 new StreamApplication() {
   @Override
   public void init(StreamGraph streamGraph, Config config) {
-// TODO: we should probably not be creating the execution context 
this early since it needs
-// to be shipped off to various tasks.
 streamGraph.withContextManager(
 new ContextManager() {
   @Override
diff --git 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
index 1bef011a34f..f653cfc934b 100644
--- 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
+++ 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
@@ -69,28 +69,6 @@
  */
 // TODO: instrumentation for the consumer
 public class BoundedSourceSystem {
-  /**
-   * Returns the configuration required to instantiate a consumer for the 
given {@link
-   * BoundedSource}.
-   *
-   * @param id a unique id for the source. Must use only valid characters for 
a system name in
-   * Samza.
-   * @param source the source
-   * @param coder a coder to deserialize messages received by the source's 
consumer
-   * @param  the type of object produced by the source consumer
-   */
-  public static  Map createConfigFor(
-  String id, BoundedSource source, Coder> coder, 
String stepName) {
-final Map config = new HashMap<>();
-final String streamPrefix = "systems." + id;
-config.put(streamPrefix + ".samza.factory", 
BoundedSourceSystem.Factory.class.getName());
-config.put(streamPrefix + ".source", 
Base64Serializer.serializeUnchecked(source));
-config.put(streamPrefix + ".coder", 
Base64Serializer.serializeUnchecked(coder));
-config.put(streamPrefix + ".stepName", stepName);
-config.put("streams." + id + ".samza.system", id);
-config.put("streams." + id + ".samza.bounded", "true");
-return config;
-  }
 
   private static  List> split(
   BoundedSource source, SamzaPipelineOptions pipelineOptions) throws 
Exception {
@@ -414,8 +392,7 @@ private void enqueueUninterruptibly(IncomingMessageEnvelope 
envelope) {
 
   /**
* A {@link SystemFactory} that produces a {@link BoundedSourceSystem} for a 
particular {@link
-   * BoundedSource} reg

[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-5690:
-

 Summary: Issue with GroupByKey in BeamSql using SparkRunner
 Key: BEAM-5690
 URL: https://issues.apache.org/jira/browse/BEAM-5690
 Project: Beam
  Issue Type: Task
  Components: runner-spark
Reporter: Kenneth Knowles
Assignee: Amit Sela


Reported on user@

{quote}We are trying to setup a pipeline with using BeamSql and the trigger 
used is default (AfterWatermark crosses the window). 
Below is the pipeline:
  
   KafkaSource (KafkaIO) 
   ---> Windowing (FixedWindow 1min)
   ---> BeamSql
   ---> KafkaSink (KafkaIO)
 
We are using Spark Runner for this. 
The BeamSql query is:
{code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}

We are grouping by Col3 which is a string. It can hold values string[0-9]. 
 
The records are getting emitted out at 1 min to kafka sink, but the output 
record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start time 
and window end time)

{code}
{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00 0}
{code}
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152733
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:51
Start Date: 09/Oct/18 15:51
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223760656
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.cloud.dataflow.worker;
+
+/** Temporary redirect for 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */
 
 Review comment:
   Opened BEAM-5686 for when we can remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152733)
Time Spent: 1h 10m  (was: 1h)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152734
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:51
Start Date: 09/Oct/18 15:51
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6615: [BEAM-5326] Shim main 
class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#issuecomment-428246844
 
 
   Thanks @boyuanzz. PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152734)
Time Spent: 1h 20m  (was: 1h 10m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5686) Remove DataflowRunnerHarness shim again

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5686:
---

 Summary: Remove DataflowRunnerHarness shim again
 Key: BEAM-5686
 URL: https://issues.apache.org/jira/browse/BEAM-5686
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152732
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:50
Start Date: 09/Oct/18 15:50
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223760415
 
 

 ##
 File path: sdks/go/pkg/beam/runners/dataflow/dataflow.go
 ##
 @@ -149,10 +149,12 @@ func Execute(ctx context.Context, p *beam.Pipeline) 
error {
return fmt.Errorf("failed to generate model pipeline: %v", err)
}
 
-   id := atomic.AddInt32(, 1)
-   modelURL := gcsx.Join(*stagingLocation, fmt.Sprintf("model-%v-%v", id, 
time.Now().UnixNano()))
-   workerURL := gcsx.Join(*stagingLocation, fmt.Sprintf("worker-%v-%v", 
id, time.Now().UnixNano()))
-   jarURL := gcsx.Join(*stagingLocation, 
fmt.Sprintf("dataflow-worker-%v-%v.jar", id, time.Now().UnixNano()))
+   // NOTE(herohde) 10/8/2018: the last segment of the names must be 
"worker" and "dataflow-worker.jar".
 
 Review comment:
   Opened BEAM-5689


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152732)
Time Spent: 1h  (was: 50m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643659#comment-16643659
 ] 

Scott Wegner commented on BEAM-5688:


I have https://github.com/apache/beam/pull/6608 out to fix this, but the 
current version doesn't work.

> [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on 
> githubPullRequestId assert
> --
>
> Key: BEAM-5688
> URL: https://issues.apache.org/jira/browse/BEAM-5688
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/]
>  * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge]
>  * [Test source 
> code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234]
> Initial investigation:
> This is a problem with how the website gradle scripts are implemented to 
> accept an githubPullRequestId. The Cron job will not have an associated PR, 
> so this currently fails.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary

2018-10-09 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643661#comment-16643661
 ] 

Pablo Estrada commented on BEAM-5683:
-

I'll take a look in a bit

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_requirements_cache
> processes.check_call(cmd_args)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py",
>  line 46, in check_call
> return subprocess.check_call(*args, **kwargs)
>   File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
> raise CalledProcessError(retcode, cmd)
> CalledProcessError: Command 
> '['/home/jenkins/jenkins-slave/workspace/bea

[jira] [Created] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5689:
---

 Summary: Remove artifact naming constraint for portable Dataflow 
job
 Key: BEAM-5689
 URL: https://issues.apache.org/jira/browse/BEAM-5689
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde


Artifact names/keys are not preserved in Dataflow. Remove the below workarounds 
when they are.

 * Go Dataflow runner
 * Java and Python container boot code (probably)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert

2018-10-09 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-5688:
--

 Summary: [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 
Fails on githubPullRequestId assert
 Key: BEAM-5688
 URL: https://issues.apache.org/jira/browse/BEAM-5688
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Scott Wegner
Assignee: Alan Myrvold


_Use this form to file an issue for test failure:_
 * [Jenkins 
Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/]
 * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge]
 * [Test source 
code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234]

Initial investigation:

This is a problem with how the website gradle scripts are implemented to accept 
an githubPullRequestId. The Cron job will not have an associated PR, so this 
currently fails.


_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread Maximilian Michels (JIRA)
Maximilian Michels created BEAM-5687:


 Summary: Checkpointing in portable pipelines does not work
 Key: BEAM-5687
 URL: https://issues.apache.org/jira/browse/BEAM-5687
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 2.9.0


Checkpoints fail:

{noformat}
AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
ToKeyedWorkItem (1/1).}
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator 
Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
ToKeyedWorkItem (1/1).
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
... 6 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
... 5 more
Caused by: java.lang.NullPointerException
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
at 
org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
at 
org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
... 7 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5281) There are several deadlinks in beam-site, please removed.

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5281.

   Resolution: Duplicate
Fix Version/s: Not applicable

Dupe of BEAM-5681; there's something broken in the pre-commit scripts.

> There are several deadlinks in beam-site, please removed.
> -
>
> Key: BEAM-5281
> URL: https://issues.apache.org/jira/browse/BEAM-5281
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Boyuan Zhang
>Assignee: Melissa Pashniak
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Deadlinks in beam-site cause nightly build failed: 
> https://scans.gradle.com/s/nzwfwj6iqlgrg/console-log?task=:beam-website:testWebsite#L13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152728
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:41
Start Date: 09/Oct/18 15:41
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223756962
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.cloud.dataflow.worker;
+
+/** Temporary redirect for 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */
 
 Review comment:
   I'll track that separately. The condition is internal to Dataflow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152728)
Time Spent: 50m  (was: 40m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152710
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:20
Start Date: 09/Oct/18 15:20
Worklog Time Spent: 10m 
  Work Description: splovyt opened a new pull request #6616: [BEAM-5624] 
Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616
 
 
   Fix for the following error mentioned in BEAM-5624:
   _AttributeError (module 'avro.schema' has no attribute 'parse')_
   
   This is is part of a series of PRs with goal to make Apache Beam PY3 
compatible. The proposal with the outlined approach has been documented 
[here](https://s.apache.org/beam-python-3).
   
   @tvalentyn @Fematich @charlesccychen @aaltay @Juta @manuzhang 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152710)
Time Spent: 10m
Remaining Estimate: 0h

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>A

[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643606#comment-16643606
 ] 

Scott Wegner commented on BEAM-5683:


[~pabloem] / [~robertwb] can either of you help out?

bq. Can we access pip subprocess logs?

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_requirements_cache
> processes.check_call(cmd_args)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py",
>  line 46, in check_call
> return subprocess.check_call(*args, **kwargs)
>   File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
> raise CalledProcessError(retcode, cmd)
> CalledProcessErr

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152708
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:15
Start Date: 09/Oct/18 15:15
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223745814
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152707
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:15
Start Date: 09/Oct/18 15:15
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223745814
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152705
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:11
Start Date: 09/Oct/18 15:11
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223744438
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152704
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:10
Start Date: 09/Oct/18 15:10
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223743972
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java
 ##
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.net.ServerSocket;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Test class for MetricsGraphiteSink. */
+public class MetricsGraphiteSinkTest {
+  private static NetworkMockServer graphiteServer;
+  private static int port;
+
+  @BeforeClass
+  public static void beforeClass() throws IOException, InterruptedException {
+// get free local port
+ServerSocket serverSocket = new ServerSocket(0);
+port = serverSocket.getLocalPort();
+serverSocket.close();
+graphiteServer = new NetworkMockServer(port);
+Thread.sleep(200);
+graphiteServer.clear();
+graphiteServer.start();
+  }
+
+  @Before
+  public void before() {
+graphiteServer.clear();
+  }
+
+  @AfterClass
+  public static void afterClass() throws IOException {
+graphiteServer.stop();
+  }
+
+  @Test
+  public void testWriteMetricsWithCommittedSupported() throws Exception {
+MetricQueryResults metricQueryResults = new CustomMetricQueryResults(true);
+PipelineOptions pipelineOptions = PipelineOptionsFactory.create();
+pipelineOptions.setMetricsGraphitePort(port);
+pipelineOptions.setMetricsGraphiteHost("127.0.0.1");
+MetricsGraphiteSink metricsGraphiteSink = new 
MetricsGraphiteSink(pipelineOptions);
+metricsGraphiteSink.writeMetrics(metricQueryResults);
+Thread.sleep(2000L);
 
 Review comment:
   Yes because, when we write messages to the socket, then on a different 
thread `NetworkMockServer` reads the socket and adds messages to an arrayList 
so that they could be read in the assert. On heavy load jenkins server, the 
test might fail because the `NetworkMockServer` does not have enough time to 
add the messages to the arraylist


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152704)
Time Spent: 2h 10m  (was: 2h)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
>     URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152696
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:01
Start Date: 09/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223733107
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsink_test.py
 ##
 @@ -75,6 +76,10 @@ def _create_temp_file(self, name='', suffix=''):
 
 class MyFileBasedSink(filebasedsink.FileBasedSink):
 
+  @unittest.skipIf(sys.version_info[0] == 3 and
+   os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
+   'This test still needs to be fixed on Python 3.'
+   'TODO: BEAM-5627')
 
 Review comment:
   Let's add: TODO: BEAM-5627, BEAM-5618


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152696)
Time Spent: 3h 10m  (was: 3h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152697
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:01
Start Date: 09/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223731816
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -21,7 +21,9 @@
 
 import io
 import logging
+import os
 
 Review comment:
   We recently merged https://github.com/apache/beam/pull/6587. All tests in 
this file are now passing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152697)
Time Spent: 3h 10m  (was: 3h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152693
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:54
Start Date: 09/Oct/18 14:54
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223736905
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.ap

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152692
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:48
Start Date: 09/Oct/18 14:48
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223714684
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
 
 Review comment:
   I thought about it but the difference between the loops are the generic 
type, so with type erasure, I would need to pass a class to the common method 
and use a switch class in the code. I prefer the 3 loops to that kind of code. 
WDYT, anything to suggest ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152692)
Time Spent: 1h 40m  (was: 1.5h)

> Implement a Graphite sink for the metrics pusher
> --

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152680
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:04
Start Date: 09/Oct/18 14:04
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223714684
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
 
 Review comment:
   I thought about it but the difference between the loops are the generic 
type, so with type erasure, I would need to pass a class to the common method 
and use a switch class in the code. That would be even less maintainable


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152680)
Time Spent: 1.5h  (was: 1h 20m)

> Implement a Graphite sink for the metrics pusher
> ---

[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152679
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:04
Start Date: 09/Oct/18 14:04
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223714650
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
 
 Review comment:
   `sdkWorkerCommand`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152679)
Time Spent: 5h 10m  (was: 5h)

> Python Flink ValidatesRunner job fixes
> --
>
>     Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152678
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 13:53
Start Date: 09/Oct/18 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223709815
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   It was just to give it a name, not for deduplication


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152678)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152677
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 13:53
Start Date: 09/Oct/18 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223709815
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   I was just to give it a name, not for deduplication


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152677)
Time Spent: 1h 10m  (was: 1h)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152649
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223670431
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
 
 Review comment:
   Add a comment, that this can be Batch or Streaming? Or convert this to 
boolean `isStreaming`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152649)
Time Spent: 4.5h  (was: 4h 20m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152652
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223667225
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
+}
+
 def flinkCompatibilityMatrix = {
-  def type = it
-  def name = 'flinkCompatibilityMatrix' + type
+  def config = it ? it as CompatibilityMatrixConfig : new 
CompatibilityMatrixConfig()
+  def type = config.type
+  def harnessType = config.harnessType
+  def name = 'flinkCompatibilityMatrix' + type + harnessType
   tasks.create(name: name) {
 dependsOn 'setupVirtualenv'
-dependsOn ':beam-sdks-python-container:docker'
+dependsOn 'createProcessWorker'
 dependsOn ':beam-runners-flink_2.11-job-server:shadowJar'
+if (type.toLowerCase() == 'docker')
+  dependsOn ':beam-sdks-python-container:docker'
 doLast {
   exec {
 executable 'sh'
-args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath} 
${type}"
+args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
--flink_job_server_jar=${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath}
 --type=${type} --harness_type=${harnessType}"
   }
 }
   }
 }
 
-flinkCompatibilityMatrix('Batch')
-flinkCompatibilityMatrix('Streaming')
+flinkCompatibilityMatrix(type: 'Batch')
+flinkCompatibilityMatrix(type:'Streaming')
+flinkCompatibilityMatrix(type: 'Batch', harnessType: 'Process')
+flinkCompatibilityMatrix(type:'Streaming', harnessType: 'Process')
 
 Review comment:
   Missing space after colon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152652)
Time Spent: 4h 40m  (was: 4.5h)

> Python Flink ValidatesRunner job fixes
> ------
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152653
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669759
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
 
 Review comment:
   outputFile => sdkWorkerFile.
   
   Wonder if we can just use relative paths and skip the generation code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152653)
Time Spent: 4h 40m  (was: 4.5h)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152656
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669060
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
+  outputs.file outputFile
+  doLast{
 
 Review comment:
   space after `doLast`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152656)
Time Spent: 5h  (was: 4h 50m)

> Python Flink ValidatesRunner job fixes
> --
>
>     Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152648
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223670096
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
 
 Review comment:
   Make this an enum?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152648)
Time Spent: 4.5h  (was: 4h 20m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152654
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669841
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
 
 Review comment:
   Could we rename this to `sdkWorkerFileCode`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152654)
Time Spent: 4h 50m  (was: 4h 40m)

> Python Flink ValidatesRunner job fixes
> --
>
>     Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152655
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223667213
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
+}
+
 def flinkCompatibilityMatrix = {
-  def type = it
-  def name = 'flinkCompatibilityMatrix' + type
+  def config = it ? it as CompatibilityMatrixConfig : new 
CompatibilityMatrixConfig()
+  def type = config.type
+  def harnessType = config.harnessType
+  def name = 'flinkCompatibilityMatrix' + type + harnessType
   tasks.create(name: name) {
 dependsOn 'setupVirtualenv'
-dependsOn ':beam-sdks-python-container:docker'
+dependsOn 'createProcessWorker'
 dependsOn ':beam-runners-flink_2.11-job-server:shadowJar'
+if (type.toLowerCase() == 'docker')
+  dependsOn ':beam-sdks-python-container:docker'
 doLast {
   exec {
 executable 'sh'
-args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath} 
${type}"
+args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
--flink_job_server_jar=${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath}
 --type=${type} --harness_type=${harnessType}"
   }
 }
   }
 }
 
-flinkCompatibilityMatrix('Batch')
-flinkCompatibilityMatrix('Streaming')
+flinkCompatibilityMatrix(type: 'Batch')
+flinkCompatibilityMatrix(type:'Streaming')
 
 Review comment:
   Missing space after colon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152655)

> Python Flink ValidatesRunner job fixes
> ------
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152651
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223671137
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Flink.groovy
 ##
 @@ -33,8 +33,8 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_VR_Flink',
   steps {
 gradle {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatch')
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixStreaming')
+  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatchProcess')
 
 Review comment:
   +1 Appending arguments to the task name is not particularly readable. 
Parameters would make this more readable. The task `flinkCompatibilityMatrix` 
is already parameterized, so we just need to change the syntax here and remove 
the task generation code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152651)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152650
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223668109
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py
 ##
 @@ -34,12 +36,26 @@
   # Run as
   #
   # python -m apache_beam.runners.portability.flink_runner_test \
-  # /path/to/job_server.jar \
+  # --flink_job_server_jar=/path/to/job_server.jar \
+  # --type=Batch \
+  # --harness_type=docker \
   # [FlinkRunnerTest.test_method, ...]
-  flinkJobServerJar = sys.argv.pop(1)
-  streaming = sys.argv.pop(1).lower() == 'streaming'
 
-  # This is defined here to only be run when we invoke this file explicitly.
+  parser = argparse.ArgumentParser(add_help=True)
+  parser.add_argument('--flink_job_server_jar',
+  help='Job server jar to submit jobs.')
+  parser.add_argument('--type', default='batch',
+  help='Job type. batch or streaming')
 
 Review comment:
   Would make this a boolean parameter called `streaming`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152650)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    3   4   5   6   7   8   9   10   11   12   >