[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-10-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=154120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154120
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 13/Oct/18 18:40
Start Date: 13/Oct/18 18:40
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6592: 
[BEAM-4176] Enable Post Commit JAVA PVR tests for Flink
URL: https://github.com/apache/beam/pull/6592#discussion_r224968563
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1498,6 +1498,7 @@ artifactId=${project.name}
 testClassesDirs = 
project.files(project.project(":beam-sdks-java-core").sourceSets.test.output.classesDirs,
 project.project(":beam-runners-core-java").sourceSets.test.output.classesDirs)
 maxParallelForks config.parallelism
 useJUnit(config.testCategories)
+dependsOn ':beam-sdks-java-container:docker'
 
 Review comment:
   Is this later going to change to use the process environment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154120)
Time Spent: 33h 40m  (was: 33.5h)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 
> PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 33h 40m
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink

2018-10-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154119
 ]

ASF GitHub Bot logged work on BEAM-5708:


Author: ASF GitHub Bot
Created on: 13/Oct/18 18:21
Start Date: 13/Oct/18 18:21
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #6638: [BEAM-5708] Cache 
environment in portable flink runner
URL: https://github.com/apache/beam/pull/6638
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy 
b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index 290f72399db..0ba980b27e5 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -1484,6 +1484,7 @@ artifactId=${project.name}
   def beamTestPipelineOptions = [
 
"--runner=org.apache.beam.runners.reference.testing.TestPortableRunner",
 "--jobServerDriver=${config.jobServerDriver}",
+"--environmentCacheMillis=1",
   ]
   if (config.jobServerConfig) {
 
beamTestPipelineOptions.add("--jobServerConfig=${config.jobServerConfig}")
diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
index 988a94826fb..bb2b9dcbe16 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
@@ -24,12 +24,17 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.Executors;
 import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.runners.core.construction.PipelineOptionsTranslation;
 import org.apache.beam.runners.core.construction.graph.ExecutableStage;
 import org.apache.beam.runners.fnexecution.control.StageBundleFactory;
 import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
 import org.apache.beam.sdk.fn.function.ThrowingFunction;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PortablePipelineOptions;
 import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.java.ExecutionEnvironment;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -104,9 +109,30 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  if (this.getClass().getClassLoader() != 
ExecutionEnvironment.class.getClassLoader()) {
+LOG.warn(
+"{} is not loaded on parent Flink classloader. "
++ "Falling back to synchronous environment release for job 
{}.",
+this.getClass(),
+jobInfo.jobId());
+release(wrapper);
+  } else {
+// Schedule task to clean the container later.
+// Ensure that this class is loaded in the parent Flink classloader.
+getExecutor()
+.schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
+  }
+} else {
+  // Do not release this asynchronously, as the releasing could fail due 
to the classloader not
+  // being available anymore after the tasks have been removed from the 
execution engine.
+  release(wrapper);
+}
   }
 
   private ConcurrentHashMap getCache() {
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java
index 5107c389bb1..a8dfa8e0665 100644
---

[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=154104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154104
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 13/Oct/18 17:07
Start Date: 13/Oct/18 17:07
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #6683: [BEAM-5442] Revert 
#6675 "Revert PRs #6557 #6589 #6600"
URL: https://github.com/apache/beam/pull/6683
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154104)
Time Spent: 9h 50m  (was: 9h 40m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example

2018-10-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154100
 ]

ASF GitHub Bot logged work on BEAM-5097:


Author: ASF GitHub Bot
Created on: 13/Oct/18 14:07
Start Date: 13/Oct/18 14:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] closed pull request #6157: [BEAM-5097][WIP] 
Add counter to combine example in go sdk
URL: https://github.com/apache/beam/pull/6157
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/examples/cookbook/combine/combine.go 
b/sdks/go/examples/cookbook/combine/combine.go
index 7e24aa1fb30..1950a687d24 100644
--- a/sdks/go/examples/cookbook/combine/combine.go
+++ b/sdks/go/examples/cookbook/combine/combine.go
@@ -63,11 +63,16 @@ type extractFn struct {
MinLength int `json:"min_length"`
 }
 
+// A global context for simplicity.
+var ctx = context.Background()
+
 func (f *extractFn) ProcessElement(row WordRow, emit func(string, string)) {
+   small_words := beam.NewCounter("example.namespace", "small_words")
if len(row.Word) >= f.MinLength {
emit(row.Word, row.Corpus)
+   } else {
+   small_words.Inc(ctx, 1)
}
-   // TODO(herohde) 7/14/2017: increment counter for "small words"
 }
 
 // TODO(herohde) 7/14/2017: the choice of a string (instead of []string) for 
the


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154100)
Time Spent: 1h 40m  (was: 1.5h)

> Increment counter for "small words" in go SDK example
> -
>
> Key: BEAM-5097
> URL: https://issues.apache.org/jira/browse/BEAM-5097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Increment counter for "small words" in go SDK example



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example

2018-10-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154099
 ]

ASF GitHub Bot logged work on BEAM-5097:


Author: ASF GitHub Bot
Created on: 13/Oct/18 14:07
Start Date: 13/Oct/18 14:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6157: [BEAM-5097][WIP] 
Add counter to combine example in go sdk
URL: https://github.com/apache/beam/pull/6157#issuecomment-429545012
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154099)
Time Spent: 1.5h  (was: 1h 20m)

> Increment counter for "small words" in go SDK example
> -
>
> Key: BEAM-5097
> URL: https://issues.apache.org/jira/browse/BEAM-5097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Increment counter for "small words" in go SDK example



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154082
 ]

ASF GitHub Bot logged work on BEAM-5626:


Author: ASF GitHub Bot
Created on: 13/Oct/18 04:25
Start Date: 13/Oct/18 04:25
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6587: 
[BEAM-5626] Fix hadoop filesystem test for py3.
URL: https://github.com/apache/beam/pull/6587#discussion_r224949182
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -214,6 +214,11 @@ def setUp(self):
   url = self.fs.join(self.tmpdir, filename)
   self.fs.create(url).close()
 
+try:# Python 2
 
 Review comment:
   @tvalentynI came across this python2->python3 doc from python.org, LINK: 
 
 python-2-3 Difference  . 
   
   Interesting read. Section "Use feature detection instead of version 
detection" talks about replying on sys.version_info. figured maybe we can weigh 
in the point this doc brought up before we apply the change to every place. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154082)
Time Spent: 5h 40m  (was: 5.5h)

> Several IO tests fail in Python 3 with RuntimeError('dictionary changed size 
> during iteration',)}
> -
>
> Key: BEAM-5626
> URL: https://issues.apache.org/jira/browse/BEAM-5626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ruoyun Huang
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
>  ERROR: test_delete_dir 
> (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py",
>  line 506, in test_delete_dir
>  self.fs.delete([url_t1])
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py",
>  line 370, in delete
>  raise BeamIOError("Delete operation failed", exceptions)
>  apache_beam.io.filesystem.BeamIOError: Delete operation failed with 
> exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size 
> during iteration',   )}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154083
 ]

ASF GitHub Bot logged work on BEAM-5626:


Author: ASF GitHub Bot
Created on: 13/Oct/18 04:28
Start Date: 13/Oct/18 04:28
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6587: 
[BEAM-5626] Fix hadoop filesystem test for py3.
URL: https://github.com/apache/beam/pull/6587#discussion_r224949182
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -214,6 +214,11 @@ def setUp(self):
   url = self.fs.join(self.tmpdir, filename)
   self.fs.create(url).close()
 
+try:# Python 2
 
 Review comment:
   @tvalentynI came across this python2->python3 doc from python.org, LINK: 
 
 python-2-3 Difference  . 
   
   Interesting read. Section "Use feature detection instead of version 
detection" talks about replying on sys.version_info. Though the point does 
makes our use here into a negative.  
   
   just FYI.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154083)
Time Spent: 5h 50m  (was: 5h 40m)

> Several IO tests fail in Python 3 with RuntimeError('dictionary changed size 
> during iteration',)}
> -
>
> Key: BEAM-5626
> URL: https://issues.apache.org/jira/browse/BEAM-5626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ruoyun Huang
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
>  ERROR: test_delete_dir 
> (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py",
>  line 506, in test_delete_dir
>  self.fs.delete([url_t1])
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py",
>  line 370, in delete
>  raise BeamIOError("Delete operation failed", exceptions)
>  apache_beam.io.filesystem.BeamIOError: Delete operation failed with 
> exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size 
> during iteration',   )}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154081
 ]

ASF GitHub Bot logged work on BEAM-5626:


Author: ASF GitHub Bot
Created on: 13/Oct/18 04:24
Start Date: 13/Oct/18 04:24
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6587: 
[BEAM-5626] Fix hadoop filesystem test for py3.
URL: https://github.com/apache/beam/pull/6587#discussion_r224949182
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -214,6 +214,11 @@ def setUp(self):
   url = self.fs.join(self.tmpdir, filename)
   self.fs.create(url).close()
 
+try:# Python 2
 
 Review comment:
   I came across this python2->python3 doc from python.org.  
 python-2-3 Difference 
   
   Interesting read. Section "Use feature detection instead of version 
detection" talks about replying on sys.version_info, figured maybe we can weigh 
in the point this doc brought up before we apply the change to every place. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154081)
Time Spent: 5.5h  (was: 5h 20m)

> Several IO tests fail in Python 3 with RuntimeError('dictionary changed size 
> during iteration',)}
> -
>
> Key: BEAM-5626
> URL: https://issues.apache.org/jira/browse/BEAM-5626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ruoyun Huang
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>  ERROR: test_delete_dir 
> (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py",
>  line 506, in test_delete_dir
>  self.fs.delete([url_t1])
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py",
>  line 370, in delete
>  raise BeamIOError("Delete operation failed", exceptions)
>  apache_beam.io.filesystem.BeamIOError: Delete operation failed with 
> exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size 
> during iteration',   )}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154080
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 13/Oct/18 02:44
Start Date: 13/Oct/18 02:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#issuecomment-429505015
 
 
   @herohde this PR is ready to merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154080)
Time Spent: 1h 20m  (was: 1h 10m)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154078
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 13/Oct/18 01:29
Start Date: 13/Oct/18 01:29
Worklog Time Spent: 10m 
  Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] 
Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108
 
 
   R:  @herohde 
   cc: @boyuanzz @pabloem 
   
   Addressed.  Also, option definition moved to WorkerOptions per Pablo's 
suggestion. 
   
   Thanks to Boyuan for pointing out the right place for error message. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154078)
Time Spent: 2.5h  (was: 2h 20m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154077
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 13/Oct/18 01:03
Start Date: 13/Oct/18 01:03
Worklog Time Spent: 10m 
  Work Description: mwylde commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224944539
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
+  private AtomicBoolean cancelled = new AtomicBoolean(false);
+  private AtomicLong count = new AtomicLong();
+
+  @Override
+  public void run(SourceContext> ctx) 
throws Exception {
+while (!cancelled.get() && (messageCount == 0 || 
count.getAndIncrement() < messageCount)) {
+  ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] 
{}));
+  Thread.sleep(intervalMillis);
 
 Review comment:
   👍 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154077)
Time Spent: 2h 20m  (was: 2h 10m)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154075
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:54
Start Date: 13/Oct/18 00:54
Worklog Time Spent: 10m 
  Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] 
Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899
 
 
   I was not able to reproduce the issue where negative numbers are represented 
as `long` type I mentioned in 
https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh 
installation of apache-beam==2.7.0, so I instead opened 
https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a 
regression in 2.8.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154075)
Time Spent: 3h  (was: 2h 50m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154074
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:54
Start Date: 13/Oct/18 00:54
Worklog Time Spent: 10m 
  Work Description: mwylde commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224944203
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
+  private AtomicBoolean cancelled = new AtomicBoolean(false);
+  private AtomicLong count = new AtomicLong();
 
 Review comment:
   Good to know that this will never be called concurrently. I'll change this 
to a long.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154074)
Time Spent: 2h 10m  (was: 2h)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154073
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:53
Start Date: 13/Oct/18 00:53
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix 
unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899
 
 
   I was not able to reproduce 
https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh 
installation of apache-beam==2.7.0, so I instead opened 
https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a 
regression in 2.8.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154073)
Time Spent: 2h 50m  (was: 2h 40m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154072
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:51
Start Date: 13/Oct/18 00:51
Worklog Time Spent: 10m 
  Work Description: mwylde commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224944097
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
 
 Review comment:
   👍 I've moved it to 
org.apache.beam.runners.flink.translation.wrappers.streaming.io


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154072)
Time Spent: 2h  (was: 1h 50m)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154071
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:49
Start Date: 13/Oct/18 00:49
Worklog Time Spent: 10m 
  Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] 
Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105
 
 
   Unfortunately I think 
https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very 
unintuitive change, so we need to roll it back and either fix the underlying 
issue with typing of negative numbers or proceed with a different solution 
here. We would need to cherry-pick the change into the release branch, so I'll 
mark BEAM-5621 as release blocker until cherry-pick is in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154071)
Time Spent: 2h 40m  (was: 2.5h)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5714) RedisIO emit error of EXEC without MULTI

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5714?focusedWorklogId=154070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154070
 ]

ASF GitHub Bot logged work on BEAM-5714:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:47
Start Date: 13/Oct/18 00:47
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6651: [BEAM-5714] Fix 
RedisIO EXEC without MULTI error
URL: https://github.com/apache/beam/pull/6651#issuecomment-429497451
 
 
   cc: @vvarma might have an opinion


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154070)
Time Spent: 40m  (was: 0.5h)

> RedisIO emit error of EXEC without MULTI
> 
>
> Key: BEAM-5714
> URL: https://issues.apache.org/jira/browse/BEAM-5714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-redis
>Affects Versions: 2.7.0
>Reporter: K.K. POON
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> RedisIO has EXEC without MULTI error after SET a batch of records.
>  
> By looking at the source code, I guess there is missing `pipeline.multi();` 
> after exec() the last batch.
> [https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L555]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154069
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 13/Oct/18 00:42
Start Date: 13/Oct/18 00:42
Worklog Time Spent: 10m 
  Work Description: mwylde commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224943673
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 ##
 @@ -63,6 +64,8 @@
   public static final String GROUP_BY_KEY_TRANSFORM_URN =
   getUrn(StandardPTransforms.Primitives.GROUP_BY_KEY);
   public static final String IMPULSE_TRANSFORM_URN = 
getUrn(StandardPTransforms.Primitives.IMPULSE);
+  public static final String STREAMING_IMPULSE_TRANSFORM_URL = 
"flink:transform:streaming_impulse:v1";
 
 Review comment:
   👍 moved to FlinkStreamingPortablePipelineTranslator


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154069)
Time Spent: 1h 50m  (was: 1h 40m)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154064
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:58
Start Date: 12/Oct/18 23:58
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6205: [BEAM-4374] 
Implementing a subset of the new metrics framework in python.
URL: https://github.com/apache/beam/pull/6205
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto 
b/model/fn-execution/src/main/proto/beam_fn_api.proto
index a0795a7c285..915686de6b3 100644
--- a/model/fn-execution/src/main/proto/beam_fn_api.proto
+++ b/model/fn-execution/src/main/proto/beam_fn_api.proto
@@ -40,6 +40,7 @@ option java_outer_classname = "BeamFnApi";
 
 import "beam_runner_api.proto";
 import "endpoints.proto";
+import "google/protobuf/descriptor.proto";
 import "google/protobuf/timestamp.proto";
 import "google/protobuf/wrappers.proto";
 
@@ -250,11 +251,16 @@ message ProcessBundleRequest {
 message ProcessBundleResponse {
   // (Optional) If metrics reporting is supported by the SDK, this represents
   // the final metrics to record for this bundle.
+  // DEPRECATED
   Metrics metrics = 1;
 
   // (Optional) Specifies that the bundle has been split since the last
   // ProcessBundleProgressResponse was sent.
   BundleSplit split = 2;
+
+  // (Required) The list of metrics or other MonitoredState
+  // collected while processing this bundle.
+  repeated MonitoringInfo monitoring_infos = 3;
 }
 
 // A request to report progress information for a given bundle.
@@ -275,9 +281,9 @@ message MonitoringInfo {
   // Sub types like field formats - int64, double, string.
   // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION
   // valid values are:
-  // beam:metrics:[SumInt64|LatestInt64|Top-NInt64|Bottom-NInt64|
-  // SumDouble|LatestDouble|Top-NDouble|Bottom-NDouble|DistributionInt64|
-  // DistributionDouble|MonitoringDataTable]
+  // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64|
+  // sum_double|latest_double|top_n_double|bottom_n_double|
+  // distribution_int_64|distribution_double|monitoring_data_table
   string type = 2;
 
   // The Metric or monitored state.
@@ -302,6 +308,45 @@ message MonitoringInfo {
   // Some systems such as Stackdriver will be able to aggregate the metrics
   // using a subset of the provided labels
   map labels = 5;
+
+  // The walltime of the most recent update.
+  // Useful for aggregation for Latest types such as LatestInt64.
+  google.protobuf.Timestamp timestamp = 6;
+}
+
+message MonitoringInfoUrns {
+  enum Enum {
+USER_COUNTER_URN_PREFIX = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metric:user"];
+
+ELEMENT_COUNT = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metric:element_count:v1"];
+
+START_BUNDLE_MSECS = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+  "beam:metric:pardo_execution_time:start_bundle_msecs:v1"];
+
+PROCESS_BUNDLE_MSECS = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metric:pardo_execution_time:process_bundle_msecs:v1"];
+
+FINISH_BUNDLE_MSECS = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metric:pardo_execution_time:finish_bundle_msecs:v1"];
+
+TOTAL_MSECS = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metric:ptransform_execution_time:total_msecs:v1"];
+  }
+}
+
+message MonitoringInfoTypeUrns {
+  enum Enum {
+SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:sum_int_64"];
+
+DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+"beam:metrics:distribution_int_64"];
+
+LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) =
+  "beam:metrics:latest_int_64"];
+  }
 }
 
 message Metric {
@@ -525,12 +570,16 @@ message Metrics {
 }
 
 message ProcessBundleProgressResponse {
-  // (Required)
+  // DEPRECATED (Required)
   Metrics metrics = 1;
 
   // (Optional) Specifies that the bundle has been split since the last
   // ProcessBundleProgressResponse was sent.
   BundleSplit split = 2;
+
+  // (Required) The list of metrics or other MonitoredState
+  // collected while processing this bundle.
+  repeated MonitoringInfo monitoring_infos = 3;
 }
 
 message ProcessBundleSplitRequest {
@@ -795,7 +844,6 @@ message LogEntry {
 enum Enum {
   // Unspecified level information. Will be logged at the TRACE level.
   UNSPECIFIED = 0;
-  // Trace

[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154063
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:58
Start Date: 12/Oct/18 23:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6205: [BEAM-4374] 
Implementing a subset of the new metrics framework in python.
URL: https://github.com/apache/beam/pull/6205#issuecomment-429492504
 
 
   Okay, as this looks good, I'll go ahead and merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154063)
Time Spent: 8h 20m  (was: 8h 10m)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154062
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:54
Start Date: 12/Oct/18 23:54
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6665: [BEAM-5636] Java 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#issuecomment-429492075
 
 
   @boyuanzz Btw, please also update the doc describing this change to include 
Java.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154062)
Time Spent: 1h 10m  (was: 1h)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154061
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:40
Start Date: 12/Oct/18 23:40
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6665: 
[BEAM-5636] Java support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#discussion_r224939002
 
 

 ##
 File path: sdks/java/container/boot.go
 ##
 @@ -103,7 +103,17 @@ func main() {
filepath.Join(jarsDir, "slf4j-jdk14.jar"),
filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
}
+
+   var hasWorkerExperiment = strings.Contains(options, 
"use_staged_dataflow_worker_jar")
for _, md := range artifacts {
+   if hasWorkerExperiment {
+   if strings.HasPrefix(md.Name, 
"beam-runners-google-cloud-dataflow-java-fn-api-worker") {
+   continue
+   }
+   if strings.HasPrefix(md.Name, "dataflow-worker.jar") {
 
 Review comment:
   Small comment: this can be == instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154061)
Time Spent: 1h  (was: 50m)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154059
 ]

ASF GitHub Bot logged work on BEAM-3587:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:36
Start Date: 12/Oct/18 23:36
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #384: [BEAM-3587] Add a 
note to Gradle shadowJar for merge service files
URL: https://github.com/apache/beam-site/pull/384
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/documentation/runners/flink.md 
b/src/documentation/runners/flink.md
index 6dc6e7b69d..df64ba46f9 100644
--- a/src/documentation/runners/flink.md
+++ b/src/documentation/runners/flink.md
@@ -51,7 +51,7 @@ For more information, the [Flink 
Documentation](https://ci.apache.org/projects/f
 ```java
 
   org.apache.beam
-  beam-runners-flink_2.10
+  beam-runners-flink_2.11
   {{ site.release_latest }}
   runtime
 
@@ -81,6 +81,62 @@ $ mvn exec:java 
-Dexec.mainClass=org.apache.beam.examples.WordCount \
 If you have a Flink `JobManager` running on your local machine you can give 
`localhost:6123` for
 `flinkMaster`.
 
+Behind the hood, to create your shaded jar (containing your pipeline and the 
Flink runner dependencies), you have to use the `maven-shade-plugin`:
+
+```java
+
+org.apache.beam
+beam-runners-flink_2.10
+{{ site.release_latest }}
+
+```
+
+```java
+
+org.apache.maven.plugins
+maven-shade-plugin
+${maven-shade-plugin.version}
+
+
false
+
+
+*:*
+
+META-INF/*.SF
+META-INF/*.DSA
+META-INF/*.RSA
+
+
+
+
+
+
+package
+
+shade
+
+
+
true
+shaded
+
+
+
+
+
+
+
+```
+
+Then, Maven build will create the shaded jar.
+
+If you prefer to use Gradle, you can achieve the same using `shadowJar`:
+
+```java
+shadowJar {
+mergeServiceFiles()
+}
+```
+
 ## Pipeline options for the Flink Runner
 
 When executing your pipeline with the Flink Runner, you can set these pipeline 
options.
diff --git a/src/documentation/runners/spark.md 
b/src/documentation/runners/spark.md
index 1502f242c0..4b4479e0e3 100644
--- a/src/documentation/runners/spark.md
+++ b/src/documentation/runners/spark.md
@@ -37,7 +37,7 @@ You can add a dependency on the latest version of the Spark 
runner by adding to
 
 ### Deploying Spark with your application
 
-In some cases, such as running in local mode/Standalone, your (self-contained) 
application would be required to pack Spark by explicitly adding the following 
dependencies in your pom.xml:
+Most of the time (running in local mode/Standalone or using `spark-submit`), 
your (self-contained) application would be required to pack Spark by explicitly 
adding the following dependencies in your pom.xml:
 ```java
 
   org.apache.spark
@@ -94,6 +94,17 @@ After running mvn package, run ls 
target and you shoul
 beam-examples-1.0.0-shaded.jar
 ```
 
+If you are using gradle, you have to use `shadowJar` to create the shaded jar 
enabling `mergeServiceFiles()`:
+```java
+shadowJar {
+transform(AppendingTransformer) {
+resource = 'reference.conf'
+}
+relocate 'com.google.protobuf', 'shaded.protobuf'
+mergeServiceFiles()
+}
+```
+
 To run against a Standalone cluster simply run:
 ```
 spark-submit --class com.beam.examples.BeamPipeline --master spark://HOST:PORT 
target/beam-examples-1.0.0-shaded.jar --runner=SparkRunner


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154059)
Time Spent: 1h 20m  (was: 1h 10m)

> User reports TextIO failure in FlinkRunne

[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154058
 ]

ASF GitHub Bot logged work on BEAM-3587:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:36
Start Date: 12/Oct/18 23:36
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #384: [BEAM-3587] Add a note 
to Gradle shadowJar for merge service files
URL: https://github.com/apache/beam-site/pull/384#issuecomment-429489867
 
 
   I've migrated the changes for this pull request onto the migrated website 
sources in the `apache/beam` repository: 
https://github.com/swegner/beam/tree/migrated-pr-384
   
   To pull the migrated changes into your local git client, run:
   
   ```
   git remote add swegner g...@github.com:swegner/beam.git && git fetch swegner
   git checkout -B bigqueryio swegner/migrated-pr-384
   ```
   
   You can then push the changes to your own branch and [open a new pull 
request](https://github.com/apache/beam/compare?expand=1) against the 
apache/beam repository.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154058)
Time Spent: 1h 10m  (was: 1h)

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
> Fix For: Not applicable
>
> Attachments: screen1.png, screen2.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example]";
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=154060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154060
 ]

ASF GitHub Bot logged work on BEAM-1081:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:36
Start Date: 12/Oct/18 23:36
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6670: [BEAM-1081] 
Annotations custom message support and classes tests.
URL: https://github.com/apache/beam/pull/6670#issuecomment-429489882
 
 
   @jglezt it looks like there are test issues, could you look at those?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154060)
Time Spent: 50m  (was: 40m)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154057
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:35
Start Date: 12/Oct/18 23:35
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6665: 
[BEAM-5636] Java support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#discussion_r224938513
 
 

 ##
 File path: sdks/java/container/boot.go
 ##
 @@ -103,7 +103,17 @@ func main() {
filepath.Join(jarsDir, "slf4j-jdk14.jar"),
filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
}
+
+  var has_worker_experiment = strings.Contains(options, 
"use_staged_dataflow_worker_jar")
 
 Review comment:
   Fixed, thanks~


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154057)
Time Spent: 50m  (was: 40m)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154055
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:17
Start Date: 12/Oct/18 23:17
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6665: 
[BEAM-5636] Java support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#discussion_r224936531
 
 

 ##
 File path: sdks/java/container/boot.go
 ##
 @@ -103,7 +103,17 @@ func main() {
filepath.Join(jarsDir, "slf4j-jdk14.jar"),
filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
}
+
+  var has_worker_experiment = strings.Contains(options, 
"use_staged_dataflow_worker_jar")
 
 Review comment:
   nit: go uses camelCase for local variables. You also need to run "gofmt -w 
." to fix the indentation.
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154055)
Time Spent: 40m  (was: 0.5h)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154054
 ]

ASF GitHub Bot logged work on BEAM-5708:


Author: ASF GitHub Bot
Created on: 12/Oct/18 23:07
Start Date: 12/Oct/18 23:07
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6638: [BEAM-5708] Cache 
environment in portable flink runner
URL: https://github.com/apache/beam/pull/6638#issuecomment-429485973
 
 
   @angoenka did you test the fallback case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154054)
Time Spent: 1h 50m  (was: 1h 40m)

> Support caching of SDKHarness environments in flink
> ---
>
> Key: BEAM-5708
> URL: https://issues.apache.org/jira/browse/BEAM-5708
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Cache and reuse environment to improve performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154050
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 22:27
Start Date: 12/Oct/18 22:27
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#issuecomment-429479809
 
 
   Re: @herohde Please take another look at this. Thanks~


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154050)
Time Spent: 0.5h  (was: 20m)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154049
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 22:17
Start Date: 12/Oct/18 22:17
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix 
unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105
 
 
   Unfortunately I think 
https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very 
unintuitive change, so we need to roll it back and either fix the underlying 
issue with typing of negative numbers or proceed with a different solution 
here. We would need to cherry-pick the change into the release branch, so I'll 
mark BEAM-5615 as release blocker until cherry-pick is in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154049)
Time Spent: 2.5h  (was: 2h 20m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154048
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 22:16
Start Date: 12/Oct/18 22:16
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429477876
 
 
   And it's green!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154048)
Time Spent: 4h 10m  (was: 4h)
Remaining Estimate: 67h 50m  (was: 68h)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 4h 10m
>  Remaining Estimate: 67h 50m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154047
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 12/Oct/18 22:10
Start Date: 12/Oct/18 22:10
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #6205: [BEAM-4374] 
Implementing a subset of the new metrics framework in python.
URL: https://github.com/apache/beam/pull/6205#issuecomment-429476625
 
 
   Squashed all the commits,
   FYI I imported this PR and internal google tests are also passing.
   
   @robertwb, happy to iterate more on your suggestions but we would like to 
submit this PR, and finish up this iteration


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154047)
Time Spent: 8h 10m  (was: 8h)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154046
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 12/Oct/18 22:00
Start Date: 12/Oct/18 22:00
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6679: [BEAM-1251] Add a 
link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on 
Beam site.
URL: https://github.com/apache/beam/pull/6679#issuecomment-429474553
 
 
   Yup! 
   
   If you pop open the "All checks have passed" bar, you'll see a link to the 
"Website_Stage_GCS" job 
[results](https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Commit/79/),
 which contains a link to your staged changes for review: 
http://apache-beam-website-pull-requests.storage.googleapis.com/6679/index.html
   
   (I'm brainstorming a way to make that link more prominent on GitHub; let me 
know if you have ideas)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154046)
Time Spent: 22h 50m  (was: 22h 40m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time Spent: 22h 50m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154045
 ]

ASF GitHub Bot logged work on BEAM-5734:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:56
Start Date: 12/Oct/18 21:56
Worklog Time Spent: 10m 
  Work Description: casidiablo commented on a change in pull request #6682: 
[BEAM-5734] RedisIO: only call Jedis.exec() on finishBundle if there is 
something to send
URL: https://github.com/apache/beam/pull/6682#discussion_r224925261
 
 

 ##
 File path: 
sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java
 ##
 @@ -86,7 +86,7 @@ public void testBulkRead() throws Exception {
   @Test
   public void testWriteReadUsingDefaultAppendMethod() throws Exception {
 ArrayList> data = new ArrayList<>();
-for (int i = 0; i < 100; i++) {
+for (int i = 0; i < 8000; i++) {
 
 Review comment:
   If the test had used this value instead, unit tests would have detected the 
issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154045)
Time Spent: 20m  (was: 10m)

> RedisIO: finishBundle calls Jedis.exec without checking if there are 
> operations in the pipeline
> ---
>
> Key: BEAM-5734
> URL: https://issues.apache.org/jira/browse/BEAM-5734
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-redis
>Reporter: Cristian
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It throws:
>  
> {code:java}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
> at 
> org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
> at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestC

[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154044
 ]

ASF GitHub Bot logged work on BEAM-5734:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:55
Start Date: 12/Oct/18 21:55
Worklog Time Spent: 10m 
  Work Description: casidiablo opened a new pull request #6682: [BEAM-5734] 
RedisIO: only call Jedis.exec() on finishBundle if there is something to send
URL: https://github.com/apache/beam/pull/6682
 
 
   This fixes a bug in the RedisIO.write sink. The `finishBundle()` calls 
Jedis' `pipeline.exec()` method without checking if there is actually something 
to flush.
   
   That results in this exception being thrown:
   
   ```
   org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.ap

[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154043
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:52
Start Date: 12/Oct/18 21:52
Worklog Time Spent: 10m 
  Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] 
Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108
 
 
   R:  @herohde 
   cc: @boyuanzz @pabloem 
   
   Addressed.  Also, option definition moved to WorkerOptions. 
   
   Thanks to Boyuan for pointing out the right place for error message. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154043)
Time Spent: 2h 20m  (was: 2h 10m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154042&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154042
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:47
Start Date: 12/Oct/18 21:47
Worklog Time Spent: 10m 
  Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] 
Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709
 
 
   Another somewhat related observation:
   Following snippets fails on Python 2, after this PR (in Direct Runner), but 
will pass in Python 3 where there is no distinction between `int` and `long`.
   
   ```
   p = TestPipeline(options=PipelineOptions(pipeline_args))
   input_data = p | beam.Create([1, -2])   # This becomes a [1, -2L]!  
(Unrelated to this PR).
   expected_result = [-2, 1]  
   assert_that(input_data, equal_to(expected_result)) 
   ```
   ```
   apache_beam.testing.util.BeamAssertException: Failed assert: [-2, 1] == [1, 
-2L] [while running 'assert_that/Match']
   ```
   Do we know why negatives are represented as longs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154042)
Time Spent: 2h 20m  (was: 2h 10m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154039
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:39
Start Date: 12/Oct/18 21:39
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429470182
 
 
   I merged the purported fix for that, so you should have better luck.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154039)
Time Spent: 4h  (was: 3h 50m)
Remaining Estimate: 68h  (was: 68h 10m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 4h
>  Remaining Estimate: 68h
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154037
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:33
Start Date: 12/Oct/18 21:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix 
unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709
 
 
   Another somewhat related observation:
   Following snippets fails on Python 2, after this PR (in Direct Runner), but 
will pass in Python 3 where there is no distinction between `int` and `long`.
   
   ```
   p = TestPipeline(options=PipelineOptions(pipeline_args))
   input_data = p | beam.Create([1, -2])   # This becomes a [1, -2L]!  
(Unrelated to this PR).
   expected_result = [-2, 1]  
   assert_that(input_data, equal_to(expected_result)) 
   ```
   Do we know why negatives are represented as longs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154037)
Time Spent: 2h 10m  (was: 2h)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154036
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:30
Start Date: 12/Oct/18 21:30
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108
 
 
   R: @boyuanzz @herohde @pabloem 
   
   Addressed.  Also, option definition moved to WorkerOptions. 
   
   Thanks to Boyuan for pointing out the right place for error message. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154036)
Time Spent: 2h 10m  (was: 2h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154028
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 21:00
Start Date: 12/Oct/18 21:00
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6665: 
[BEAM-5636] Java support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#discussion_r224913769
 
 

 ##
 File path: sdks/java/container/boot.go
 ##
 @@ -104,6 +104,9 @@ func main() {
filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
}
for _, md := range artifacts {
+   if strings.HasPrefix(md.Name, 
"beam-runners-google-cloud-dataflow-java-fn-api-worker") {
 
 Review comment:
   The purpose here is, if there is java worker jar in artifacts, then this jar 
should not be included into sdk harness classpath, which seems like we don't 
need to check experiment. wdyt?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154028)
Time Spent: 20m  (was: 10m)

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154027
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:59
Start Date: 12/Oct/18 20:59
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-429460451
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154027)
Time Spent: 2h  (was: 1h 50m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154025
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:49
Start Date: 12/Oct/18 20:49
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429457986
 
 
   Precommits fail due to flaky test see 
[BEAM-5709](https://issues.apache.org/jira/browse/BEAM-5709)
   Succeeded precommit run: 
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/318/
   Failing precommit run: 
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/319/
   
   I accidentally started precommit twice in a row.
   
   Can we merge this? I feel this will be safer, than to try get green 
precommit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154025)
Time Spent: 3h 50m  (was: 3h 40m)
Remaining Estimate: 68h 10m  (was: 68h 20m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3h 50m
>  Remaining Estimate: 68h 10m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154024
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:47
Start Date: 12/Oct/18 20:47
Worklog Time Spent: 10m 
  Work Description: HuangLED opened a new pull request #6680: [BEAM-5637] 
Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680
 
 
   Python support for customer worker jar (as a staged file). 
   
   Tested positive and negative case by starting actual jobs.
   
   PreCommit pass locally. 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154024)
Time Spent: 1h 50m  (was: 1h 40m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Iss

[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154022
 ]

ASF GitHub Bot logged work on BEAM-5636:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:46
Start Date: 12/Oct/18 20:46
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6665: 
[BEAM-5636] Java support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#discussion_r224910454
 
 

 ##
 File path: sdks/java/container/boot.go
 ##
 @@ -104,6 +104,9 @@ func main() {
filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
}
for _, md := range artifacts {
+   if strings.HasPrefix(md.Name, 
"beam-runners-google-cloud-dataflow-java-fn-api-worker") {
 
 Review comment:
   We should only make this check if the experiment is set. Also, the name will 
change to "dataflow-worker.jar" when the artifact bug is fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154022)
Time Spent: 10m
Remaining Estimate: 0h

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154021
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:44
Start Date: 12/Oct/18 20:44
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6667: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6667#discussion_r224910055
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser):
  'job submission, the files will be staged in the staging area '
  '(--staging_location option) and the workers will install them in '
  'same order they were specified on the command line.'))
-
+parser.add_argument(
+'--dataflow_worker_jar',
+dest='dataflow_worker_jar',
+type=str,
+help='Dataflow worker jar.'
+)
 
 Review comment:
   Thanks!  Issue addressed but lost the status in this PR due to my 
sub-optional git operations. 
   
   Opening another PR. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154021)
Time Spent: 1h 40m  (was: 1.5h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154019
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:42
Start Date: 12/Oct/18 20:42
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6602: [BEAM-5621] Fix 
unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429456279
 
 
   This is clearly backward incompatible change, however I think this is the 
right behavior. In that sense I do not think it is a regression. However, we 
should clearly highlight this in our release notes/blog post etc.
   
   @tvalentyn Could you create a JIRA, mark it for 2.8.0, explain the change in 
behaviour and mark it as fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154019)
Time Spent: 2h  (was: 1h 50m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154020
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:42
Start Date: 12/Oct/18 20:42
Worklog Time Spent: 10m 
  Work Description: HuangLED closed pull request #6667: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6667
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index a172535b100..2c061e0ec52 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser):
  'job submission, the files will be staged in the staging area '
  '(--staging_location option) and the workers will install them in '
  'same order they were specified on the command line.'))
-
+parser.add_argument(
+'--dataflow_worker_jar',
+dest='dataflow_worker_jar',
+type=str,
+help='Dataflow worker jar.'
+)
 
 class PortableOptions(PipelineOptions):
 
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 1acd3488524..5be60bd701b 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -381,6 +381,12 @@ def run_pipeline(self, pipeline):
 self.dataflow_client = apiclient.DataflowApplicationClient(
 pipeline._options)
 
+if setup_options.dataflow_worker_jar:
+  experiments = ["use_staged_dataflow_worker_jar"]
+  if debug_options.experiments is not None:
+experiments = list(set(experiments + debug_options.experiments))
+  debug_options.experiments = experiments
+
 # Create the job description and send a request to the service. The result
 # can be None if there is no need to send a request to the service (e.g.
 # template creation). If a request was sent and failed then the call will
diff --git a/sdks/python/apache_beam/runners/portability/stager.py 
b/sdks/python/apache_beam/runners/portability/stager.py
index ef7401ac6aa..e336fd3f9b9 100644
--- a/sdks/python/apache_beam/runners/portability/stager.py
+++ b/sdks/python/apache_beam/runners/portability/stager.py
@@ -123,8 +123,7 @@ def stage_job_resources(self,
 
 Returns:
   A list of file names (no paths) for the resources staged. All the
-  files
-  are assumed to be staged at staging_location.
+  files are assumed to be staged at staging_location.
 
 Raises:
   RuntimeError: If files specified are not found or error encountered
@@ -256,6 +255,13 @@ def stage_job_resources(self,
 'The file "%s" cannot be found. Its location was specified by '
 'the --sdk_location command-line option.' % sdk_path)
 
+if hasattr(setup_options, 'dataflow_worker_jar') and \
+setup_options.dataflow_worker_jar:
+  jar_staged_filename = 'dataflow-worker.jar'
+  staged_path = FileSystems.join(staging_location, jar_staged_filename)
+  self.stage_artifact(setup_options.dataflow_worker_jar, staged_path)
+  resources.append(jar_staged_filename)
+
 # Delete all temp files created while staging job resources.
 shutil.rmtree(temp_dir)
 retrieval_token = self.commit_manifest()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154020)
Time Spent: 1.5h  (was: 1h 20m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https:

[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154016
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:41
Start Date: 12/Oct/18 20:41
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154016)
Time Spent: 3h 20m  (was: 3h 10m)
Remaining Estimate: 68h 40m  (was: 68h 50m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3h 20m
>  Remaining Estimate: 68h 40m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154018
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:41
Start Date: 12/Oct/18 20:41
Worklog Time Spent: 10m 
  Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] 
Fix overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429396509
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154018)
Time Spent: 3h 40m  (was: 3.5h)
Remaining Estimate: 68h 20m  (was: 68.5h)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3h 40m
>  Remaining Estimate: 68h 20m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154017
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:41
Start Date: 12/Oct/18 20:41
Worklog Time Spent: 10m 
  Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] 
Fix overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154017)
Time Spent: 3.5h  (was: 3h 20m)
Remaining Estimate: 68.5h  (was: 68h 40m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3.5h
>  Remaining Estimate: 68.5h
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154015
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:37
Start Date: 12/Oct/18 20:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6679: [BEAM-1251] Add a 
link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on 
Beam site.
URL: https://github.com/apache/beam/pull/6679#issuecomment-429455025
 
 
   Hey @swegner, am I using a correct way to change the Beam-site? Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154015)
Time Spent: 22h 40m  (was: 22.5h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time Spent: 22h 40m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154014
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:34
Start Date: 12/Oct/18 20:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn opened a new pull request #6679: [BEAM-1251] 
Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing 
efforts on Beam site.
URL: https://github.com/apache/beam/pull/6679
 
 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154014)
Time Spent: 22.5h  (was: 22h 20m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time Spe

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154013
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:32
Start Date: 12/Oct/18 20:32
Worklog Time Spent: 10m 
  Work Description: tvalentyn opened a new pull request #6678: [BEAM-1251] 
Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing 
efforts on Beam site.
URL: https://github.com/apache/beam/pull/6678
 
 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154013)
Time Spent: 22h 20m  (was: 22h 10m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time S

[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154011
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:26
Start Date: 12/Oct/18 20:26
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6602: [BEAM-5621] 
Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429452493
 
 
   CC: @Juta @robertwb @aaltay 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154011)
Time Spent: 1h 50m  (was: 1h 40m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154010
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:26
Start Date: 12/Oct/18 20:26
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6602: [BEAM-5621] 
Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#issuecomment-429452434
 
 
   Because we now sort by types, we may now encounter different behavior when 
using different string types.  For example, previously 
`assert_that(equal_to(['a', u'b', b'c'], ['a', 'b', 'c]))` worked, but now it 
may not because this sorting order now depends on the exact type (i.e. the 
sorting may produce `[u'b', 'a', b'c'`) even for orderable types.  Should we 
consider this a regression?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154010)
Time Spent: 1h 40m  (was: 1.5h)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=154009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154009
 ]

ASF GitHub Bot logged work on BEAM-1081:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:17
Start Date: 12/Oct/18 20:17
Worklog Time Spent: 10m 
  Work Description: jglezt commented on issue #6670: [BEAM-1081] 
Annotations custom message support and classes tests.
URL: https://github.com/apache/beam/pull/6670#issuecomment-429450311
 
 
   Hi Pablo! 
   I resolved the merge problems.
   Waiting for the Jenkins build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154009)
Time Spent: 40m  (was: 0.5h)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5709?focusedWorklogId=154008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154008
 ]

ASF GitHub Bot logged work on BEAM-5709:


Author: ASF GitHub Bot
Created on: 12/Oct/18 20:15
Start Date: 12/Oct/18 20:15
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #6639: [BEAM-5709] Fix 
flaky tests in BeamFnControlServiceTest.
URL: https://github.com/apache/beam/pull/6639
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
 
b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
index 7ba41fd7cc2..7f08a562ce5 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
@@ -88,6 +88,8 @@ public void testClientConnecting() throws Exception {
 server.shutdown();
 server.awaitTermination(1, TimeUnit.SECONDS);
 server.shutdownNow();
+Thread.sleep(1000); // Wait for stub to close stream.
+
 verify(requestObserver).onCompleted();
 verifyNoMoreInteractions(requestObserver);
   }
@@ -126,6 +128,8 @@ public void testMultipleClientsConnecting() throws 
Exception {
 server.shutdown();
 server.awaitTermination(1, TimeUnit.SECONDS);
 server.shutdownNow();
+Thread.sleep(1000); // Wait for stub to close stream.
+
 verify(requestObserver).onCompleted();
 verifyNoMoreInteractions(requestObserver);
 verify(anotherRequestObserver).onCompleted();


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154008)
Time Spent: 1.5h  (was: 1h 20m)

> Tests in BeamFnControlServiceTest are flaky.
> 
>
> Key: BEAM-5709
> URL: https://issues.apache.org/jira/browse/BEAM-5709
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
> Tests for BeamFnControlService are currently flaky. The test attempts to 
> verify that onCompleted was called on the mock streams, but that function 
> gets called on a separate thread, so occasionally the function will not have 
> been called yet, despite the server being shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153992
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:57
Start Date: 12/Oct/18 18:57
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6592: 
[BEAM-4176] Enable Post Commit JAVA PVR tests for Flink
URL: https://github.com/apache/beam/pull/6592#discussion_r224885162
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   You can base a PR on another PR's branch. Not sure how much sense that makes 
because then you would merge into your fork instead of the upstream repo. So I 
take that back :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153992)
Time Spent: 33.5h  (was: 33h 20m)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 
> PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 33.5h
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153987
 ]

ASF GitHub Bot logged work on BEAM-5683:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:48
Start Date: 12/Oct/18 18:48
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6646: [BEAM-5683] Print 
command logs on failure
URL: https://github.com/apache/beam/pull/6646
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/portability/stager.py 
b/sdks/python/apache_beam/runners/portability/stager.py
index ef7401ac6aa..885d69c7866 100644
--- a/sdks/python/apache_beam/runners/portability/stager.py
+++ b/sdks/python/apache_beam/runners/portability/stager.py
@@ -408,7 +408,7 @@ def _populate_requirements_cache(requirements_file, 
cache_dir):
 ':all:'
 ]
 logging.info('Executing command: %s', cmd_args)
-processes.check_call(cmd_args)
+processes.check_output(cmd_args)
 
   @staticmethod
   def _build_setup_package(setup_file, temp_dir, build_setup_args=None):
@@ -421,7 +421,7 @@ def _build_setup_package(setup_file, temp_dir, 
build_setup_args=None):
 os.path.basename(setup_file), 'sdist', '--dist-dir', temp_dir
 ]
   logging.info('Executing command: %s', build_setup_args)
-  processes.check_call(build_setup_args)
+  processes.check_output(build_setup_args)
   output_files = glob.glob(os.path.join(temp_dir, '*.tar.gz'))
   if not output_files:
 raise RuntimeError(
@@ -549,7 +549,7 @@ def _download_pypi_sdk_package(temp_dir,
 
 logging.info('Executing command: %s', cmd_args)
 try:
-  processes.check_call(cmd_args)
+  processes.check_output(cmd_args)
 except subprocess.CalledProcessError as e:
   raise RuntimeError(repr(e))
 
diff --git a/sdks/python/apache_beam/runners/portability/stager_test.py 
b/sdks/python/apache_beam/runners/portability/stager_test.py
index 9edc4eb4a2d..43b24967d82 100644
--- a/sdks/python/apache_beam/runners/portability/stager_test.py
+++ b/sdks/python/apache_beam/runners/portability/stager_test.py
@@ -65,7 +65,7 @@ def populate_requirements_cache(self, requirements_file, 
cache_dir):
 self.create_temp_file(os.path.join(cache_dir, 'def.txt'), 'nothing')
 
   def build_fake_pip_download_command_handler(self, has_wheels):
-"""A stub for apache_beam.utils.processes.check_call that imitates pip.
+"""A stub for apache_beam.utils.processes.check_output that imitates pip.
 
   Args:
 has_wheels: Whether pip fake should have a whl distribution of 
packages.
@@ -291,7 +291,7 @@ def test_sdk_location_default(self):
 options.view_as(SetupOptions).sdk_location = 'default'
 
 with mock.patch(
-'apache_beam.utils.processes.check_call',
+'apache_beam.utils.processes.check_output',
 self.build_fake_pip_download_command_handler(has_wheels=False)):
   _, staged_resources = self.stager.stage_job_resources(
   options, temp_dir=self.make_temp_dir(), staging_location=staging_dir)
@@ -309,7 +309,7 @@ def test_sdk_location_default_with_wheels(self):
 options.view_as(SetupOptions).sdk_location = 'default'
 
 with mock.patch(
-'apache_beam.utils.processes.check_call',
+'apache_beam.utils.processes.check_output',
 self.build_fake_pip_download_command_handler(has_wheels=True)):
   _, staged_resources = self.stager.stage_job_resources(
   options, temp_dir=self.make_temp_dir(), staging_location=staging_dir)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153987)
Time Spent: 1h  (was: 50m)

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to 
> pip download flake
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
> 

[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153986
 ]

ASF GitHub Bot logged work on BEAM-5683:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:48
Start Date: 12/Oct/18 18:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6646: [BEAM-5683] Print 
command logs on failure
URL: https://github.com/apache/beam/pull/6646#issuecomment-429423506
 
 
   Ah thanks Ankur! lgtm


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153986)
Time Spent: 50m  (was: 40m)

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to 
> pip download flake
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_re

[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153983
 ]

ASF GitHub Bot logged work on BEAM-5683:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:42
Start Date: 12/Oct/18 18:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #6646: [BEAM-5683] Print 
command logs on failure
URL: https://github.com/apache/beam/pull/6646#issuecomment-429421690
 
 
   Fixed the test case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153983)
Time Spent: 40m  (was: 0.5h)

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to 
> pip download flake
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_r

[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153978
 ]

ASF GitHub Bot logged work on BEAM-5708:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6638: 
[BEAM-5708] Cache environment in portable flink runner
URL: https://github.com/apache/beam/pull/6638#discussion_r224877854
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   @mxm The class is first loaded when we create the environment. So we can be 
assured that the class is loaded before release. 
   Also, we require classes to be loaded on parent classloader for async 
container destruction. This inherently mean that once the class is loaded in 
parent class loader, its not going to be unloaded in this scenario. With the 
additional check mentioned, we will enforce this requirement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153978)
Time Spent: 1h 40m  (was: 1.5h)

> Support caching of SDKHarness environments in flink
> ---
>
> Key: BEAM-5708
> URL: https://issues.apache.org/jira/browse/BEAM-5708
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Cache and reuse environment to improve performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153980
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224876803
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested):
 if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
-  stream.write_byte(INT_TYPE)
-  stream.write_var_int64(value)
+  # In Python 3, an int may be larger than 64 bits.
+  # Note that an OverflowError on stream.write_var_int64 would happen
+  # *after* the marker byte is written, so we must check earlier.
+  try:
+# This may throw an overflow error when compiled.
+int_value = value
+# Otherwise, we must check ourselves.
+if not is_compiled:
+  if not fits_in_64_bits(value):
 
 Review comment:
   Thanks. Consider the following wording for comments: 
   

   
   ```  
   
 # In Python 3, an int may be larger than 64 bits.  
   
 # We need to check whether value fits into a 64 bit integer before 
   
 # writing the marker byte. 
   
 try:   
   
   # In Cython-compiled code this will throw an overflow error  
   
   # when value does not fit into int64.
   
   int_value = value
   
   # If Cython is not used, we must do a (slower) check ourselves.  
   
   if not is_compiled:  
   
  ...   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153980)
Time Spent: 1h 10m  (was: 1h)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153977
 ]

ASF GitHub Bot logged work on BEAM-5708:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6638: 
[BEAM-5708] Cache environment in portable flink runner
URL: https://github.com/apache/beam/pull/6638#discussion_r224875048
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   @tweise Makes sense. I will add that check. In addition of falling back to 
immediate release, I will also log a warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153977)
Time Spent: 1.5h  (was: 1h 20m)

> Support caching of SDKHarness environments in flink
> ---
>
> Key: BEAM-5708
> URL: https://issues.apache.org/jira/browse/BEAM-5708
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Cache and reuse environment to improve performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153979
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224870551
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.pxd
 ##
 @@ -69,8 +69,9 @@ cdef class DeterministicFastPrimitivesCoderImpl(CoderImpl):
 
 
 cdef object NoneType
-cdef char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE
-cdef char BYTES_TYPE, UNICODE_TYPE, LIST_TYPE, TUPLE_TYPE, DICT_TYPE, SET_TYPE
+cdef unsigned char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE
 
 Review comment:
   Curious, what was a motivation for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153979)
Time Spent: 1h  (was: 50m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=153976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153976
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:28
Start Date: 12/Oct/18 18:28
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6665: [BEAM-5326] Java 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#issuecomment-429417765
 
 
   Re: @pabloem Thanks! Working on minor changes now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153976)
Time Spent: 2h  (was: 1h 50m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=153975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153975
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:21
Start Date: 12/Oct/18 18:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6665: [BEAM-5326] Java 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6665#issuecomment-429415545
 
 
   Fwiw you can run `./gradlew :..project..:check` to run unit tests and lint 
checks without running dataflow jobs, which should be useful ; )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153975)
Time Spent: 1h 50m  (was: 1h 40m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=153974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153974
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:20
Start Date: 12/Oct/18 18:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6667: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6667#discussion_r224874588
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser):
  'job submission, the files will be staged in the staging area '
  '(--staging_location option) and the workers will install them in '
  'same order they were specified on the command line.'))
-
+parser.add_argument(
+'--dataflow_worker_jar',
+dest='dataflow_worker_jar',
+type=str,
+help='Dataflow worker jar.'
+)
 
 Review comment:
   I'm thinking that the option would be better in `WorkerOptions`, or some 
other options class related to Dataflow+Portability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153974)
Time Spent: 1h 20m  (was: 1h 10m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153972
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:13
Start Date: 12/Oct/18 18:13
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6592: 
[BEAM-4176] Enable Post Commit JAVA PVR tests for Flink
URL: https://github.com/apache/beam/pull/6592#discussion_r224873292
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   Maybe I am missing some thing.
   For this PR, I based my branch java_pvr_jenkins on  
java_pvr_cache_environments and then created the PR (merge 1 commit into 
apache:master from angoenka:java_pvr_jenkins).
   This picks the commits from both branches java_pvr_jenkins and 
java_pvr_cache_environments.
   Is there anyway to avoid picking commits from the base PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153972)
Time Spent: 33h 20m  (was: 33h 10m)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 
> PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 33h 20m
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153971
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:12
Start Date: 12/Oct/18 18:12
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6592: 
[BEAM-4176] Enable Post Commit JAVA PVR tests for Flink
URL: https://github.com/apache/beam/pull/6592#discussion_r224873292
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   Maybe I am missing some thing.try t
   For this PR, I based my branch java_pvr_jenkins on  
java_pvr_cache_environments and then created the PR (merge 1 commit into 
apache:master from angoenka:java_pvr_jenkins).
   This picks the commits from both branches java_pvr_jenkins and 
java_pvr_cache_environments.
   Is there anyway to avoid picking commits from the base PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153971)
Time Spent: 33h 10m  (was: 33h)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 
> PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 33h 10m
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153967
 ]

ASF GitHub Bot logged work on BEAM-1081:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:59
Start Date: 12/Oct/18 17:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6670: 
[BEAM-1081] Annotations custom message support and classes tests.
URL: https://github.com/apache/beam/pull/6670#discussion_r224868863
 
 

 ##
 File path: sdks/python/apache_beam/utils/annotations_test.py
 ##
 @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail():
 fnc_test_deprecated_without_since_should_fail()
   assert not w
 
+<<< HEAD
+  def test_deprecated_without_since_custom_should_fail(self):
+with warnings.catch_warnings(record=True) as w:
+  with self.assertRaises(TypeError):
+@deprecated(custom_message='Test %since%')
+def fnc_test_deprecated_without_since_custom_should_fail():
+  return 'lol'
+fnc_test_deprecated_without_since_custom_should_fail()
+===
+  def test_deprecated_without_since_should_fail_class(self):
+with warnings.catch_warnings(record=True) as w:
+  with self.assertRaises(TypeError):
+
+@deprecated()
+class Class_test_deprecated_without_since_should_fail():
+  fooo = 'lol'
+  def foo(self):
+return 'lol'
+foo = Class_test_deprecated_without_since_should_fail()
+foo.foo()
+>>> ae3ee47cf9e131cf66644532022178d559d9237d
 
 Review comment:
   merge trouble


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153967)
Time Spent: 20m  (was: 10m)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153968
 ]

ASF GitHub Bot logged work on BEAM-1081:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:59
Start Date: 12/Oct/18 17:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6670: 
[BEAM-1081] Annotations custom message support and classes tests.
URL: https://github.com/apache/beam/pull/6670#discussion_r224868835
 
 

 ##
 File path: sdks/python/apache_beam/utils/annotations_test.py
 ##
 @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail():
 fnc_test_deprecated_without_since_should_fail()
   assert not w
 
+<<< HEAD
 
 Review comment:
   it seems that there was some trouble merging here


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153968)
Time Spent: 20m  (was: 10m)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153969
 ]

ASF GitHub Bot logged work on BEAM-1081:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:59
Start Date: 12/Oct/18 17:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6670: 
[BEAM-1081] Annotations custom message support and classes tests.
URL: https://github.com/apache/beam/pull/6670#discussion_r224868893
 
 

 ##
 File path: sdks/python/apache_beam/utils/annotations_test.py
 ##
 @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail():
 fnc_test_deprecated_without_since_should_fail()
   assert not w
 
+<<< HEAD
+  def test_deprecated_without_since_custom_should_fail(self):
+with warnings.catch_warnings(record=True) as w:
+  with self.assertRaises(TypeError):
+@deprecated(custom_message='Test %since%')
+def fnc_test_deprecated_without_since_custom_should_fail():
+  return 'lol'
+fnc_test_deprecated_without_since_custom_should_fail()
+===
 
 Review comment:
   merge trouble


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153969)
Time Spent: 0.5h  (was: 20m)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5706) PubSub dependency upgrade causes internal issues for Dataflow

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5706?focusedWorklogId=153961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153961
 ]

ASF GitHub Bot logged work on BEAM-5706:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:50
Start Date: 12/Oct/18 17:50
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #6674: [BEAM-5706] 
Revert 324f0b3e3c (pull request #6564 from udim/pubsub-0-35-4)
URL: https://github.com/apache/beam/pull/6674
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py 
b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
index 2fc19daa1af..6dc60d0a807 100644
--- a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
@@ -72,15 +72,15 @@ def setUp(self):
 
 # Set up PubSub environment.
 from google.cloud import pubsub
-self.pub_client = pubsub.PublisherClient()
-self.input_topic = self.pub_client.create_topic(
-self.pub_client.topic_path(self.project, self.INPUT_TOPIC + 
_unique_id))
+self.pubsub_client = pubsub.Client(project=self.project)
+unique_topic_name = self.INPUT_TOPIC + _unique_id
+unique_subscrition_name = self.INPUT_SUB + _unique_id
+self.input_topic = self.pubsub_client.topic(unique_topic_name)
+self.input_sub = self.input_topic.subscription(unique_subscrition_name)
 
-self.sub_client = pubsub.SubscriberClient()
-self.input_sub = self.sub_client.create_subscription(
-self.sub_client.subscription_path(self.project,
-  self.INPUT_SUB + _unique_id),
-self.input_topic.name)
+self.input_topic.create()
+test_utils.wait_for_topics_created([self.input_topic])
+self.input_sub.create()
 
 # Set up BigQuery environment
 from google.cloud import bigquery
@@ -95,15 +95,14 @@ def _inject_pubsub_game_events(self, topic, message_count):
 """Inject game events as test data to PubSub."""
 
 logging.debug('Injecting %d game events to topic %s',
-  message_count, topic.name)
+  message_count, topic.full_name)
 
 for _ in range(message_count):
-  self.pub_client.publish(topic.name,
-  self.INPUT_EVENT % self._test_timestamp)
+  topic.publish(self.INPUT_EVENT % self._test_timestamp)
 
   def _cleanup_pubsub(self):
-test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub])
-test_utils.cleanup_topics(self.pub_client, [self.input_topic])
+test_utils.cleanup_subscriptions([self.input_sub])
+test_utils.cleanup_topics([self.input_topic])
 
   def _cleanup_dataset(self):
 self.dataset.delete()
@@ -124,9 +123,9 @@ def test_game_stats_it(self):
 
 # TODO(mariagh): Add teams table verifier once game_stats.py is fixed.
 
-extra_opts = {'subscription': self.input_sub.name,
+extra_opts = {'subscription': self.input_sub.full_name,
   'dataset': self.dataset.name,
-  'topic': self.input_topic.name,
+  'topic': self.input_topic.full_name,
   'fixed_window_duration': 1,
   'user_activity_window_duration': 1,
   'wait_until_finish_duration':
@@ -144,6 +143,8 @@ def test_game_stats_it(self):
 self.dataset.name, self.OUTPUT_TABLE_TEAMS)
 
 # Generate input data and inject to PubSub.
+test_utils.wait_for_subscriptions_created([self.input_topic,
+   self.input_sub])
 self._inject_pubsub_game_events(self.input_topic, self.DEFAULT_INPUT_COUNT)
 
 # Get pipeline options from command argument: --test-pipeline-options,
diff --git 
a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py 
b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
index e0e309b1265..ab109425eb6 100644
--- a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
@@ -73,16 +73,15 @@ def setUp(self):
 
 # Set up PubSub environment.
 from google.cloud import pubsub
+self.pubsub_client = pubsub.Client(project=self.project)
+unique_topic_name = self.INPUT_TOPIC + _unique_id
+unique_subscrition_name = self.INPUT_SUB + _unique_id
+self.input_topic = self.pubsub_client.topic(unique_topic_name)
+self.input_sub = self.input_topic.subscription(unique_subscrition_name)
 
-self.pub_client = pubsub.PublisherClient()
-self.input_

[jira] [Work logged] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5615?focusedWorklogId=153952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153952
 ]

ASF GitHub Bot logged work on BEAM-5615:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:22
Start Date: 12/Oct/18 17:22
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6570: [BEAM-5615] fix cmp 
is an invalid keyword for sort function in python 3
URL: https://github.com/apache/beam/pull/6570#issuecomment-429399025
 
 
   Disallowing the compare parameter starting now in Python 3 sounds reasonable 
to me. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153952)
Time Spent: 3.5h  (was: 3h 20m)

> Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword 
> argument for this function
> -
>
> Key: BEAM-5615
> URL: https://issues.apache.org/jira/browse/BEAM-5615
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py",
>  line 89, in test_top
> names)  # Note parameter passed to comparator.
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py",
>  line 111, in __or__
> return self.pipeline.apply(ptransform, self)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 467, in apply
> label or transform.label)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 477, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 513, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 193, in apply
> return m(transform, input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 199, in apply_PTransform
> return transform.expand(input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 759, in expand
> return self._fn(pcoll, *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py",
>  line 185, in Of
> TopCombineFn(n, compare, key, reverse), *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py",
>  line 111, in __or__
> return self.pipeline.apply(ptransform, self)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 513, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 193, in apply
> return m(transform, input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 199, in apply_PTransform
> return transform.expand(input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1251, in expand
> default_value = combine_fn.apply([], *self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 623, in apply
> *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py",
>  line 362, in extract_output
> self._sort_buffer(buffer, lt)
>   File 
> "/usr/local/google/home/valentyn

[jira] [Work logged] (BEAM-5706) PubSub dependency upgrade causes internal issues for Dataflow

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5706?focusedWorklogId=153951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153951
 ]

ASF GitHub Bot logged work on BEAM-5706:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:18
Start Date: 12/Oct/18 17:18
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6674: [BEAM-5706] 
Revert 324f0b3e3c (pull request #6564 from udim/pubsub-0-35-4)
URL: https://github.com/apache/beam/pull/6674#issuecomment-429397904
 
 
   Thanks, this LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153951)
Time Spent: 10m
Remaining Estimate: 0h

> PubSub dependency upgrade causes internal issues for Dataflow
> -
>
> Key: BEAM-5706
> URL: https://issues.apache.org/jira/browse/BEAM-5706
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.8.0
>Reporter: Charles Chen
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PubSub dependency upgrade in https://github.com/apache/beam/pull/6564 
> causes internal issues for Dataflow.  The Dataflow team needs to resolve this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153949
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:13
Start Date: 12/Oct/18 17:13
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429396509
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153949)
Time Spent: 3h 10m  (was: 3h)
Remaining Estimate: 68h 50m  (was: 69h)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3h 10m
>  Remaining Estimate: 68h 50m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153948
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:09
Start Date: 12/Oct/18 17:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6600: [BEAM-5442] Store 
duplicate unknown options in a list argument
URL: https://github.com/apache/beam/pull/6600#issuecomment-429395314
 
 
   @charlesccychen Fair points. Let's fix this more programmatically then. 
Builtin SDK options and user-defined options should be the only top-level 
options. "Unknown" options should only be available to the Runners via a 
separate option list which is transmitted through the Proto alongside the 
regular options.
   
   @aaltay +1 Would make sense to revert this on the release branch. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153948)
Time Spent: 9h 40m  (was: 9.5h)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153947&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153947
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 12/Oct/18 17:03
Start Date: 12/Oct/18 17:03
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6600: [BEAM-5442] Store 
duplicate unknown options in a list argument
URL: https://github.com/apache/beam/pull/6600#issuecomment-429393801
 
 
   Should we revert this change (and the 2 related changes before) for the 
release branch? I think we should address @charlesccychen's concerns before we 
release with these changes. (Perhaps a mailing discussion would help.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153947)
Time Spent: 9.5h  (was: 9h 20m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153944
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:59
Start Date: 12/Oct/18 16:59
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix 
overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429392567
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153944)
Time Spent: 2h 50m  (was: 2h 40m)
Remaining Estimate: 69h 10m  (was: 69h 20m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 2h 50m
>  Remaining Estimate: 69h 10m
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153945
 ]

ASF GitHub Bot logged work on BEAM-5653:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:59
Start Date: 12/Oct/18 16:59
Worklog Time Spent: 10m 
  Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] 
Fix overriding coders due to duplicate coderId generation
URL: https://github.com/apache/beam/pull/6649#issuecomment-429392567
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153945)
Time Spent: 3h  (was: 2h 50m)
Remaining Estimate: 69h  (was: 69h 10m)

> Dataflow FnApi Worker overrides some of Coders due to coder ID generation 
> collision.
> 
>
> Key: BEAM-5653
> URL: https://issues.apache.org/jira/browse/BEAM-5653
> Project: Beam
>  Issue Type: Test
>  Components: java-fn-execution
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Blocker
> Fix For: 2.8.0
>
>   Original Estimate: 72h
>  Time Spent: 3h
>  Remaining Estimate: 69h
>
> Due to one of latest refactorings, we got a bug in Java FnApi Worker that it 
> overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes 
> jobs to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5717) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination bucket gs://apache-beam-website-pull-requests does not exist or the write to the destination

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5717?focusedWorklogId=153937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153937
 ]

ASF GitHub Bot logged work on BEAM-5717:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:39
Start Date: 12/Oct/18 16:39
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #: [BEAM-5717] Use the 
PR context from the environment rather than Gradle properties
URL: https://github.com/apache/beam/pull/
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.test-infra/jenkins/PrecommitJobBuilder.groovy 
b/.test-infra/jenkins/PrecommitJobBuilder.groovy
index af0ef837c53..34f6e6f45ec 100644
--- a/.test-infra/jenkins/PrecommitJobBuilder.groovy
+++ b/.test-infra/jenkins/PrecommitJobBuilder.groovy
@@ -103,9 +103,6 @@ class PrecommitJobBuilder {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
   tasks(gradleTask)
   commonJobProperties.setGradleSwitches(delegate)
-  if (scope.binding.hasVariable('ghprbPullId')) {
-switches('-PgithubPullRequestId=${ghprbPullId}')
-  }
   if (nameBase == 'Java') {
 // BEAM-5035: Parallel builds are very flaky
 switches('--no-parallel')
diff --git a/website/build.gradle b/website/build.gradle
index db852dc7ce8..0a2dab65bb9 100644
--- a/website/build.gradle
+++ b/website/build.gradle
@@ -139,7 +139,7 @@ createBuildTask( name:'Gcs', useTestConfig: true, baseUrl: 
getBaseUrl(), dockerW
 createBuildTask( name:'Apache', dockerWorkDir: dockerWorkDir)
 
 def getBaseUrl() {
-  project.findProperty('githubPullRequestId')?.trim() ?: 'latest'
+  System.getenv('ghprbPullId')?.trim() ?: 'latest'
 }
 def buildContentDir(name) {
   "${project.rootDir}/build/website/generated-${name.toLowerCase()}-content"
@@ -255,7 +255,7 @@ publishWebsite.dependsOn commitWebsite
 /*
  * Stages a pull request on GCS
  * For example:
- *   ./gradlew :beam-website:stageWebsite -PgithubPullRequestId=${ghprbPullId} 
-PwebsiteBucket=foo
+ *   ./gradlew :beam-website:stageWebsite -PwebsiteBucket=foo
  */
 task stageWebsite << {
   def baseUrl = getBaseUrl()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153937)
Time Spent: 4h 10m  (was: 4h)

> [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination 
> bucket gs://apache-beam-website-pull-requests does not exist or the write to 
> the destination must be restarted
> -
>
> Key: BEAM-5717
> URL: https://issues.apache.org/jira/browse/BEAM-5717
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/28/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/3a7ujky4ekojw/console-log?task=:beam-website:stageWebsite#L4610]
>  * [Test source 
> code|https://github.com/apache/beam/blob/280277e7788b4c28680dd8ca02d54a55195b24ba/website/build.gradle#L271]
> Initial investigation:
> I haven't seen this issue before; it's not clear if this is a persistent 
> failure or a flake. I will investigate further.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5717) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination bucket gs://apache-beam-website-pull-requests does not exist or the write to the destination

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5717?focusedWorklogId=153935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153935
 ]

ASF GitHub Bot logged work on BEAM-5717:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:38
Start Date: 12/Oct/18 16:38
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #: [BEAM-5717] Use 
the PR context from the environment rather than Gradle properties
URL: https://github.com/apache/beam/pull/#issuecomment-429386629
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153935)
Time Spent: 4h  (was: 3h 50m)

> [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination 
> bucket gs://apache-beam-website-pull-requests does not exist or the write to 
> the destination must be restarted
> -
>
> Key: BEAM-5717
> URL: https://issues.apache.org/jira/browse/BEAM-5717
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/28/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/3a7ujky4ekojw/console-log?task=:beam-website:stageWebsite#L4610]
>  * [Test source 
> code|https://github.com/apache/beam/blob/280277e7788b4c28680dd8ca02d54a55195b24ba/website/build.gradle#L271]
> Initial investigation:
> I haven't seen this issue before; it's not clear if this is a persistent 
> failure or a flake. I will investigate further.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5714) RedisIO emit error of EXEC without MULTI

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5714?focusedWorklogId=153936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153936
 ]

ASF GitHub Bot logged work on BEAM-5714:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:38
Start Date: 12/Oct/18 16:38
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6651: [BEAM-5714] Fix 
RedisIO EXEC without MULTI error
URL: https://github.com/apache/beam/pull/6651#issuecomment-429386703
 
 
   R: @jbonofre are you the expert on RedisIO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153936)
Time Spent: 0.5h  (was: 20m)

> RedisIO emit error of EXEC without MULTI
> 
>
> Key: BEAM-5714
> URL: https://issues.apache.org/jira/browse/BEAM-5714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-redis
>Affects Versions: 2.7.0
>Reporter: K.K. POON
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> RedisIO has EXEC without MULTI error after SET a batch of records.
>  
> By looking at the source code, I guess there is missing `pipeline.multi();` 
> after exec() the last batch.
> [https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L555]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4796) SpannerIO waits for all input before writing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4796?focusedWorklogId=153924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153924
 ]

ASF GitHub Bot logged work on BEAM-4796:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:03
Start Date: 12/Oct/18 16:03
Worklog Time Spent: 10m 
  Work Description: nielm commented on issue #6409: [BEAM-4796] SpannerIO: 
Add option to wait for Schema to be ready.
URL: https://github.com/apache/beam/pull/6409#issuecomment-429376495
 
 
   > Can we close this PR given that #6478 includes this ?
   
   yes, closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153924)
Time Spent: 40m  (was: 0.5h)

> SpannerIO waits for all input before writing
> 
>
> Key: BEAM-4796
> URL: https://issues.apache.org/jira/browse/BEAM-4796
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.5.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SpannerIO.Write waits for all input in the window to arrive before getting 
> the schema:
> [https://github.com/apache/beam/blame/release-2.5.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L841]
>  
> In streaming mode, this is not an issue, but in batch mode, this causes the 
> pipeline to stall until all input is read, which could be a significant 
> amount of time (and temp data). 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4796) SpannerIO waits for all input before writing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4796?focusedWorklogId=153925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153925
 ]

ASF GitHub Bot logged work on BEAM-4796:


Author: ASF GitHub Bot
Created on: 12/Oct/18 16:03
Start Date: 12/Oct/18 16:03
Worklog Time Spent: 10m 
  Work Description: nielm closed pull request #6409: [BEAM-4796] SpannerIO: 
Add option to wait for Schema to be ready.
URL: https://github.com/apache/beam/pull/6409
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
index af2a3b00d1f..5cb2469bcfb 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
@@ -163,6 +163,11 @@
  * Write#withBatchSizeBytes(long)}. Setting batch size to a small value or 
zero practically disables
  * batching.
  *
+ * The write transform reads the database schema on pipeline start. If the 
schema is created as
+ * part of the same pipline, this transform needs to wait until the schema has 
been created. Use
+ * {@link Write#withSchemaReadySignal(PCollection)} to pass a {@link 
PCollection} which will be used
+ * with {@link Wait#on(PCollection[])} to prevent the schema from being read 
until it is ready.
+ *
  * The transform does not provide same transactional guarantees as Cloud 
Spanner. In particular,
  *
  * 
@@ -682,6 +687,9 @@ public CreateTransaction withTimestampBound(TimestampBound 
timestampBound) {
 abstract PTransform>, 
PCollection>>>
 getSampler();
 
+@Nullable
+abstract PCollection getSchemaReadySignal();
+
 abstract Builder toBuilder();
 
 @AutoValue.Builder
@@ -701,6 +709,8 @@ abstract Builder setSampler(
   PTransform>, PCollection>>>
   sampler);
 
+  abstract Builder setSchemaReadySignal(PCollection schemaReadySignal);
+
   abstract Write build();
 }
 
@@ -786,6 +796,15 @@ public Write withMaxNumMutations(long maxNumMutations) {
   return toBuilder().setMaxNumMutations(maxNumMutations).build();
 }
 
+/**
+ * Specifies an input PCollection that can be used with a {@code 
Wait.on(signal)} to indicate
+ * when the database schema is ready. To be used when the schema creation 
is part of the
+ * pipeline to prevent the connector reading the schema too early.
+ */
+public Write withSchemaReadySignal(PCollection signal) {
+  return toBuilder().setSchemaReadySignal(signal).build();
+}
+
 @Override
 public SpannerWriteResult expand(PCollection input) {
   getSpannerConfig().validate();
@@ -835,13 +854,16 @@ public SpannerWriteResult 
expand(PCollection input) {
   if (sampler == null) {
 sampler = createDefaultSampler();
   }
+
   // First, read the Cloud Spanner schema.
+  PCollection schemaSeed =
+  input.getPipeline().apply("Create Seed", Create.of((Void) null));
+  if (spec.getSchemaReadySignal() != null) {
+// Wait for external signal before reading schema.
+schemaSeed = schemaSeed.apply("Wait for schema", 
Wait.on(spec.getSchemaReadySignal()));
+  }
   final PCollectionView schemaView =
-  input
-  .getPipeline()
-  .apply("Create seed", Create.of((Void) null))
-  // Wait for input mutations so it is possible to chain 
transforms.
-  .apply(Wait.on(input))
+  schemaSeed
   .apply(
   "Read information schema",
   ParDo.of(new ReadSpannerSchema(spec.getSpannerConfig(


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153925)
Time Spent: 50m  (was: 40m)

> SpannerIO waits for all input before writing
> 
>
> Key: BEAM-4796
> URL: https://issues.apache.org/jira/browse/BEAM-4796
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.5.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Prio

[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153918
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 12/Oct/18 15:36
Start Date: 12/Oct/18 15:36
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6600: [BEAM-5442] 
Store duplicate unknown options in a list argument
URL: https://github.com/apache/beam/pull/6600#issuecomment-429367754
 
 
   CC: @tweise 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153918)
Time Spent: 9h 20m  (was: 9h 10m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153917
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 12/Oct/18 15:36
Start Date: 12/Oct/18 15:36
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6600: [BEAM-5442] 
Store duplicate unknown options in a list argument
URL: https://github.com/apache/beam/pull/6600#issuecomment-429367628
 
 
   To clarify again, I believe that with the current approach, a user no longer 
needs to explicitly define a pipeline option to use it--they can just pass it 
(as `--myparam abc`) and it will be "magically" available for use (as 
`options.myparam`).  This is not good for backwards compatibility, since the 
user should not rely on this implementation detail, and it will become 
problematic if we decide to change this after the user starts using it.  I 
would therefore prefer to isolate these options so that they are at least not 
user-visible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153917)
Time Spent: 9h 10m  (was: 9h)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153916
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 12/Oct/18 15:33
Start Date: 12/Oct/18 15:33
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6600: [BEAM-5442] 
Store duplicate unknown options in a list argument
URL: https://github.com/apache/beam/pull/6600#issuecomment-429366775
 
 
   Thanks.  My concern wasn't about the runtime cost.  It introduces an 
inconsistency where single and multiply passed options are treated differently 
(and this requires special casing when using the value) and it also promotes 
the use of the "magical" behavior as opposed to explicit definition of pipeline 
options.  We should not have users depend on this, since it would discourage 
explicit definition of pipeline options in the user pipeline.  I would 
therefore suggest passing "unused options" which can be parsed by the runner 
using a (potentially runner-specific) explicitly-defined parser.  What do you 
think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153916)
Time Spent: 9h  (was: 8h 50m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=153903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153903
 ]

ASF GitHub Bot logged work on BEAM-4130:


Author: ASF GitHub Bot
Created on: 12/Oct/18 15:01
Start Date: 12/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6672: [BEAM-4130] Add 
tests for FlinkJobServerDriver
URL: https://github.com/apache/beam/pull/6672
 
 
   This adds a few test cases for FlinkJobServerDriver. It also changes the 
default
   host from empty host to `localhost`.
   
   CC @angoenka @tweise 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153903)
Time Spent: 11h 40m  (was: 11.5h)

> Portable Flink runner JobService entry point in a Docker container
> --
>
> Key: BEAM-4130
> URL: https://issues.apache.org/jira/browse/BEAM-4130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> The portable Flink runner exists as a Job Service that runs somewhere. We 
> need a main entry point that itself spins up the job service (and artifact 
> staging service). The main program itself should be packaged into an uberjar 
> such that it can be run locally or submitted to a Flink deployment via `flink 
> run`.



--
This message was sent by

[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153897
 ]

ASF GitHub Bot logged work on BEAM-5708:


Author: ASF GitHub Bot
Created on: 12/Oct/18 14:27
Start Date: 12/Oct/18 14:27
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6638: 
[BEAM-5708] Cache environment in portable flink runner
URL: https://github.com/apache/beam/pull/6638#discussion_r224803034
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   Something like this (disclaimer: not tested):
   
   ```
   if (environmentCacheTTLMillis > 0 && this.getClass().getClassLoader() == 
ExecutionEnvironment.class.getClassLoader()) 
   ```
   For execution in the job server, class loader will be same (applies for 
Jenkins). On the remote Flink cluster (by default), the user class loader will 
be different and will be removed.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153897)
Time Spent: 1h 20m  (was: 1h 10m)

> Support caching of SDKHarness environments in flink
> ---
>
> Key: BEAM-5708
> URL: https://issues.apache.org/jira/browse/BEAM-5708
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Cache and reuse environment to improve performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5713) Flink portable runner schedules all tasks of streaming job on same task manager

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5713?focusedWorklogId=153877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153877
 ]

ASF GitHub Bot logged work on BEAM-5713:


Author: ASF GitHub Bot
Created on: 12/Oct/18 12:59
Start Date: 12/Oct/18 12:59
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6654: [BEAM-5713] Make 
ImpulseSourceFunction execute in parallel
URL: https://github.com/apache/beam/pull/6654#issuecomment-429316980
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153877)
Time Spent: 40m  (was: 0.5h)

> Flink portable runner schedules all tasks of streaming job on same task 
> manager
> ---
>
> Key: BEAM-5713
> URL: https://issues.apache.org/jira/browse/BEAM-5713
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Attachments: Different SlotSharingGroup.png, With 
> RichParallelSourceFunction and parallelism 5.png, 
> image-2018-10-11-11-43-50-333.png, image-2018-10-11-16-20-45-221.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The cluster has 9 task managers and 144 task slots total. A simple streaming 
> pipeline with parallelism of 8 will get all tasks scheduled on the same task 
> manager, causing the host to be fully booked and the remaining cluster idle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153847
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:40
Start Date: 12/Oct/18 09:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224728870
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested):
 if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
-  stream.write_byte(INT_TYPE)
-  stream.write_var_int64(value)
+  # In Python 3, an int may be larger than 64 bits.
+  # Note that an OverflowError on stream.write_var_int64 would happen
+  # *after* the marker byte is written, so we must check earlier.
+  try:
+# This may throw an overflow error when compiled.
+int_value = value
+# Otherwise, we must check ourselves.
+if not is_compiled:
+  if not fits_in_64_bits(value):
 
 Review comment:
   Yep. Current code:
   ```
   small_int, FastPrimitivesCoder, 1000 element(s): per element median time 
cost: 1.1301e-07 sec, relative std: 19.00%
   ```
   Removing the is_compiled check
   ```
   small_int, FastPrimitivesCoder, 1000 element(s): per element median time 
cost: 1.88589e-07 sec, relative std: 18.08%
   ```
   
   That's over a 60% increase. I tried inlining it and using constants rather 
than computing the bounds each time, which helps some but the check is entirely 
redundant on compiled code. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153847)
Time Spent: 50m  (was: 40m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153844
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:35
Start Date: 12/Oct/18 09:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6659: [BEAM-5720] Fix 
encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#issuecomment-429266297
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153844)
Time Spent: 40m  (was: 0.5h)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153840&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153840
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:04
Start Date: 12/Oct/18 09:04
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6592: 
[BEAM-4176] Enable Post Commit JAVA PVR tests for Flink
URL: https://github.com/apache/beam/pull/6592#discussion_r224718474
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java
 ##
 @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) {
 WrappedContext wrapper = getCache().get(jobInfo.jobId());
 Preconditions.checkState(
 wrapper != null, "Releasing context for unknown job: " + 
jobInfo.jobId());
-// Do not release this asynchronously, as the releasing could fail due to 
the classloader not being
-// available anymore after the tasks have been removed from the execution 
engine.
-release(wrapper);
+
+PipelineOptions pipelineOptions =
+PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions());
+int environmentCacheTTLMillis =
+
pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis();
+if (environmentCacheTTLMillis > 0) {
+  // Schedule task to clean the container later.
+  // Ensure that this class is loaded in the parent Flink classloader.
+  getExecutor()
+  .schedule(() -> release(wrapper), environmentCacheTTLMillis, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   > There is no way to make PR depend upon other PR in git so had to resort to 
adding 2 separate commits.
   
   Not true, you can just specify the branch of the other PR as base branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153840)
Time Spent: 33h  (was: 32h 50m)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 
> PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 33h
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153837
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:03
Start Date: 12/Oct/18 09:03
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224718069
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
+  private AtomicBoolean cancelled = new AtomicBoolean(false);
+  private AtomicLong count = new AtomicLong();
+
+  @Override
+  public void run(SourceContext> ctx) 
throws Exception {
+while (!cancelled.get() && (messageCount == 0 || 
count.getAndIncrement() < messageCount)) {
+  ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] 
{}));
+  Thread.sleep(intervalMillis);
 
 Review comment:
   You could also handle `InterruptedException` here since we typically want to 
continue processing the source, unless `cancel()` has been called.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153837)
Time Spent: 1h 40m  (was: 1.5h)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153836
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:02
Start Date: 12/Oct/18 09:02
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224717924
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
+  private AtomicBoolean cancelled = new AtomicBoolean(false);
+  private AtomicLong count = new AtomicLong();
+
+  @Override
+  public void run(SourceContext> ctx) 
throws Exception {
+while (!cancelled.get() && (messageCount == 0 || 
count.getAndIncrement() < messageCount)) {
+  ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] 
{}));
+  Thread.sleep(intervalMillis);
 
 Review comment:
   Does the source have to be checkpointing? If so we should synchronize on the 
checkpoint lock in `ctx` and release it before sleeping. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153836)
Time Spent: 1.5h  (was: 1h 20m)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153832
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:01
Start Date: 12/Oct/18 09:01
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224716717
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
+  private AtomicBoolean cancelled = new AtomicBoolean(false);
+  private AtomicLong count = new AtomicLong();
 
 Review comment:
   Why is this an `AtomicLong`. A normal `long` should suffice because it is 
not accessed concurrently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153832)
Time Spent: 1h 10m  (was: 1h)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153831
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:01
Start Date: 12/Oct/18 09:01
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6637: 
[BEAM-5707] Add a periodic, streaming impulse source for Flink portable 
pipelines
URL: https://github.com/apache/beam/pull/6637#discussion_r224717078
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 ##
 @@ -406,6 +417,56 @@ private void translateImpulse(
 
context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()),
 source);
   }
 
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsFlinkNativeTransform implements 
NativeTransforms.IsNativeTransform {
+@Override
+public boolean test(RunnerApi.PTransform pTransform) {
+  return 
STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform));
+}
+  }
+
+  private void translateStreamingImpulse(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform pTransform = 
pipeline.getComponents().getTransformsOrThrow(id);
+
+ObjectMapper objectMapper = new ObjectMapper();
+
+int intervalMillis;
+int messageCount;
+try {
+  JsonNode config = 
objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray());
+  intervalMillis = config.path("interval_ms").asInt(100);
+  messageCount = config.path("message_count").asInt(0);
+} catch (IOException e) {
+throw new RuntimeException("Failed to parse configuration for 
streaming impulse", e);
+}
+
+DataStreamSource> source =
+context
+.getExecutionEnvironment()
+.addSource(
+new RichParallelSourceFunction>() {
 
 Review comment:
   This should be a top-level class with the parameters passed via a 
constructor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153831)
Time Spent: 1h 10m  (was: 1h)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >