[jira] [Work logged] (BEAM-5745) Util test on annotations fails

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5745?focusedWorklogId=155703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155703
 ]

ASF GitHub Bot logged work on BEAM-5745:


Author: ASF GitHub Bot
Created on: 18/Oct/18 05:44
Start Date: 18/Oct/18 05:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6687: [BEAM-5745] Fix 
annotation test for py3
URL: https://github.com/apache/beam/pull/6687#issuecomment-430882473
 
 
   The original Python 2 implementation has issues to begin with... will add an 
extended comment in a bit. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155703)
Time Spent: 1h 10m  (was: 1h)

> Util test on annotations fails 
> ---
>
> Key: BEAM-5745
> URL: https://issues.apache.org/jira/browse/BEAM-5745
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 142, in test_frequency
>     label_check_list=[])
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 149, in check_annotation
>     self.assertIn(fnc_name + ' is ' + annotation_type, 
> str(warning[-1].message))
> AssertionError: 'fnc2_test_annotate_frequency is experimental' not found in 
> 'fnc_test_annotate_frequency is experimental.'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5780) fn-api-worker and legacy-worker should point to different dir

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5780?focusedWorklogId=155688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155688
 ]

ASF GitHub Bot logged work on BEAM-5780:


Author: ASF GitHub Bot
Created on: 18/Oct/18 04:17
Start Date: 18/Oct/18 04:17
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #6734: [BEAM-5780] Clarify 
the reasoning for the two directories for building the Dataflow worker.
URL: https://github.com/apache/beam/pull/6734
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155688)
Time Spent: 1h  (was: 50m)

> fn-api-worker and legacy-worker should point to different dir
> -
>
> Key: BEAM-5780
> URL: https://issues.apache.org/jira/browse/BEAM-5780
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5521) Cache execution trees in SDK worker

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5521?focusedWorklogId=155685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155685
 ]

ASF GitHub Bot logged work on BEAM-5521:


Author: ASF GitHub Bot
Created on: 18/Oct/18 03:28
Start Date: 18/Oct/18 03:28
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6717: [BEAM-5521] Re-use 
bundle processors across bundles.
URL: https://github.com/apache/beam/pull/6717#issuecomment-430863234
 
 
   I see CPU reduction by 50% when running locally with parallelism 1. The 
throughput increases are much more visible when combined with 
https://github.com/apache/beam/pull/6723. Will take it to the cluster with both 
changes, the gains will be visible more clearly when running with high 
parallelism. @mwylde fyi  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155685)
Time Spent: 0.5h  (was: 20m)

> Cache execution trees in SDK worker
> ---
>
> Key: BEAM-5521
> URL: https://issues.apache.org/jira/browse/BEAM-5521
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently they are re-constructed from the protos for every bundle, which is 
> expensive (especially for 1-element bundles in streaming flink). 
> Care should be taken to ensure the objects can be re-usued. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5774) beam_Release_Gradle_NightlySnapshot timed out

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5774?focusedWorklogId=155684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155684
 ]

ASF GitHub Bot logged work on BEAM-5774:


Author: ASF GitHub Bot
Created on: 18/Oct/18 03:24
Start Date: 18/Oct/18 03:24
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #6727: [BEAM-5774] 
Increase timeout of beam_Release_Gradle_NightlySnapshot
URL: https://github.com/apache/beam/pull/6727
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy 
b/.test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
index df1e29e16ca..313dcf9a621 100644
--- a/.test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
+++ b/.test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
@@ -26,8 +26,8 @@ job('beam_Release_Gradle_NightlySnapshot') {
   // Execute concurrent builds if necessary.
   concurrentBuild()
 
-  // Set common parameters.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
+  // Set common parameters. Timeout is longer, to avoid [BEAM-5774].
+  commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 140)
 
   // This is a post-commit job that runs once per day, not for every push.
   commonJobProperties.setAutoJob(


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155684)
Time Spent: 20m  (was: 10m)

> beam_Release_Gradle_NightlySnapshot timed out
> -
>
> Key: BEAM-5774
> URL: https://issues.apache.org/jira/browse/BEAM-5774
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/209/
> Looking at the trend, this is not surprising: 
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/buildTimeTrend



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5776) Using methods in map is broken on Python 3

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5776:
-

Assignee: (was: Valentyn Tymofieiev)

> Using methods in map is broken on Python 3
> --
>
> Key: BEAM-5776
> URL: https://issues.apache.org/jira/browse/BEAM-5776
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>
> E.g. 
> {code:java}
> pcoll | beam.Map(str.upper){code}
> no longer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5744?focusedWorklogId=155668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155668
 ]

ASF GitHub Bot logged work on BEAM-5744:


Author: ASF GitHub Bot
Created on: 18/Oct/18 01:10
Start Date: 18/Oct/18 01:10
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #6688: [BEAM-5744] 
Cherrypick PR 6686  to 2.8.0. release branch: 
URL: https://github.com/apache/beam/pull/6688
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/testing/util.py 
b/sdks/python/apache_beam/testing/util.py
index a9b326cce36..23d58cb29d5 100644
--- a/sdks/python/apache_beam/testing/util.py
+++ b/sdks/python/apache_beam/testing/util.py
@@ -112,9 +112,8 @@ def equal_to(expected):
   expected = list(expected)
 
   def _equal(actual):
-sorted_expected = sorted(expected,
- key=lambda x: (hash(type(x)), type(x), x))
-sorted_actual = sorted(actual, key=lambda x: (hash(type(x)), type(x), x))
+sorted_expected = sorted(expected)
+sorted_actual = sorted(actual)
 if sorted_expected != sorted_actual:
   raise BeamAssertException(
   'Failed assert: %r == %r' % (sorted_expected, sorted_actual))
diff --git a/sdks/python/apache_beam/transforms/util.py 
b/sdks/python/apache_beam/transforms/util.py
index 38839f57842..a6963604384 100644
--- a/sdks/python/apache_beam/transforms/util.py
+++ b/sdks/python/apache_beam/transforms/util.py
@@ -223,7 +223,7 @@ def __init__(self,
 if target_batch_duration_secs and target_batch_duration_secs <= 0:
   raise ValueError("target_batch_duration_secs (%s) must be positive" % (
   target_batch_duration_secs))
-if not (target_batch_overhead or target_batch_duration_secs):
+if max(0, target_batch_overhead, target_batch_duration_secs) == 0:
   raise ValueError("At least one of target_batch_overhead or "
"target_batch_duration_secs must be positive.")
 self._min_batch_size = min_batch_size


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155668)
Time Spent: 1h 40m  (was: 1.5h)

> Investigate negative numbers represented as 'long' in Python SDK + Direct 
> runner
> 
>
> Key: BEAM-5744
> URL: https://issues.apache.org/jira/browse/BEAM-5744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5627) Several IO tests fail in Python 3 when accessing a temporary file with TypeError: a bytes-like object is required, not 'str'

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5627?focusedWorklogId=155667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155667
 ]

ASF GitHub Bot logged work on BEAM-5627:


Author: ASF GitHub Bot
Created on: 18/Oct/18 01:09
Start Date: 18/Oct/18 01:09
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6671: [BEAM-5627] Fix 
sources test for py3.
URL: https://github.com/apache/beam/pull/6671#issuecomment-430841000
 
 
   @HuangLED please ping when the PR is ready.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155667)
Time Spent: 2h 50m  (was: 2h 40m)

> Several IO tests fail in Python 3  when accessing a temporary file with  
> TypeError: a bytes-like object is required, not 'str'
> --
>
> Key: BEAM-5627
> URL: https://issues.apache.org/jira/browse/BEAM-5627
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Rakesh Kumar
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> ERROR: test_split_at_fraction_exhaustive 
> (apache_beam.io.source_test_utils_test.SourceTestUtilsTest)
>  --
>  Traceback (most recent call last):
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 120, in test_split_at_fraction_exhaustive
>  source = self._create_source(data)
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 43, in _create_source
>  source = LineSource(self._create_file_with_data(data))
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 35, in _create_file_with_data
>  f.write(line + '\n')
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/tempfile.py",
>  line 622, in func_wrapper
>  return func(*args, **kwargs)
> TypeError: a bytes-like object is required, not 'str'
> Also similar:
> ==
>  ERROR: test_file_sink_writing 
> (apache_beam.io.filebasedsink_test.TestFileBasedSink)
> --
> Traceback (most recent call last):
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 121, in 
> test_file_sink_writing
>   init_token, writer_results = self._common_init(sink)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 103, in _common_init
>   writer1 = sink.open_writer(init_token, '1')
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/options/value_provider.py", line 133, in _f
>   return fnc(self, *args, **kwargs)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink.py", line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink.py", line 385, in __init__
>   self.temp_handle = self.sink.open(temp_shard_path)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 82, in open
>   file_handle.write('[start]')
>   TypeError: a bytes-like object is required, not 'str'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2953) Timeseries processing extensions using state API

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2953?focusedWorklogId=155659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155659
 ]

ASF GitHub Bot logged work on BEAM-2953:


Author: ASF GitHub Bot
Created on: 18/Oct/18 00:24
Start Date: 18/Oct/18 00:24
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #6540: 
[BEAM-2953] Advanced Timeseries examples.
URL: https://github.com/apache/beam/pull/6540#discussion_r226136139
 
 

 ##
 File path: 
sdks/java/extensions/timeseries/src/main/java/org/apache/beam/sdk/extensions/timeseries/transforms/GetValueFromKV.java
 ##
 @@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.extensions.timeseries.transforms;
+
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.values.KV;
+
+/**
+ * Extract the value from a KV.
+ *
+ * @param 
+ */
+@Experimental
+public class GetValueFromKV extends DoFn, T> {
 
 Review comment:
   Wouldn't 
[this](https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Values.java#L43)
 work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155659)
Time Spent: 3h 50m  (was: 3h 40m)

> Timeseries processing extensions using state API
> 
>
> Key: BEAM-2953
> URL: https://issues.apache.org/jira/browse/BEAM-2953
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-ideas
>Affects Versions: 2.7.0
>Reporter: Reza ardeshir rokni
>Assignee: Reuven Lax
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> A general set of timeseries transforms that abstract the user from the 
> process of dealing with some of the common problems when dealing with 
> timeseries using BEAM (in stream or batch mode).
> BEAM can be used to build out some very interesting pre-processing stages for 
> time series data. Some examples that will be useful:
>  - Downsampling time series based on simple MIN, MAX, COUNT, SUM, LAST, FIRST
>  - Creating a value for each downsampled window even if no value has been 
> emitted for the specific key. 
>  - Loading the value of a downsample with the previous value (used in FX with 
> previous close being brought into current open value)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5776) Using methods in map is broken on Python 3

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5776:
-

Assignee: Valentyn Tymofieiev

> Using methods in map is broken on Python 3
> --
>
> Key: BEAM-5776
> URL: https://issues.apache.org/jira/browse/BEAM-5776
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> E.g. 
> {code:java}
> pcoll | beam.Map(str.upper){code}
> no longer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5785) Validates runner tests fail with: Cannot convert bytes value to JSON value

2018-10-17 Thread Mark Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-5785:
---
Description: 
I run a random ValidatesRunner test on Python 3 with TestsDataflowRunner. The 
test failed before submitting job to the service.

More details about my env and test:

Python version: 3.5.3
Test: 
apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
Command:
{code}
python setup.py nosetests \
  --tests 
apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
  \
  --nocapture \
  --test-pipeline-options=" \ 
--runner=TestDataflowRunner \
--project= \
--staging_location= \
--temp_location= \
--output= \  

--sdk_location=.../beam/sdks/python/dist/apache-beam-2.9.0.dev0.tar.gz \
--num_workers=1"
{code}

Here are the stacktrace from my console:
{code}
==
ERROR: test_multiple_empty_outputs 
(apache_beam.transforms.ptransform_test.PTransformTest)
--
Traceback (most recent call last):
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/transforms/ptransform_test.py",
 line 284, in test_multiple_empty_outputs
pipeline.run()
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/testing/test_pipeline.py",
 line 107, in run
else test_runner_api))
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
403, in run
self.to_runner_api(), self.runner, self._options).run(False)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
416, in run
return self.runner.run_pipeline(self)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
 line 50, in run_pipeline
self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 374, in run_pipeline
super(DataflowRunner, self).run_pipeline(pipeline)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
 line 176, in run_pipeline
pipeline.visit(RunVisitor(self))
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
444, in visit
self._root_transform().visit(visitor, self, visited)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
780, in visit
part.visit(visitor, pipeline, visited)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
780, in visit
part.visit(visitor, pipeline, visited)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", line 
783, in visit
visitor.visit_transform(self)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
 line 171, in visit_transform
self.runner.run_transform(transform_node)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
 line 214, in run_transform
return m(transform_node)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 846, in run_Read
source_dict)
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
 line 85, in add_property
key=name, value=to_json_value(value, with_type=with_type)))
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
 line 104, in to_json_value
key=k, value=to_json_value(v, with_type=with_type)))
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
 line 104, in to_json_value
key=k, value=to_json_value(v, with_type=with_type)))
  File 
"/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
 line 124, in to_json_value
raise TypeError('Cannot convert %s to a JSON value.' % repr(obj))
TypeError: Cannot convert 
b'eNpVjrEKwjAURRNjq6ZOfkVd8hOZxFGQLhJe0ycW2saXpA5CQf/ctro4Xc7hwr0vYeEO9oamRGhV9NCFq/NtUNYjRDTB9d6iNHrG05eI7d/EB1rkRcYYM9FFaEyon0jiKIrd5AL6GppRVeYBTY+BlhdKcs05pZoLWmme0BqLdCpbV6Gnzd+X2YVfyDP4Qxf1BJLkOJ8NtC37Un0AfYQ+kQ=='
 to a JSON value.
 >> begin captured logging << 
avro.schema: Level 5: Register new name for 'org.apache.avro.file.Header'
avro.schema: Level 5: Register new name for 'org.apache.avro.file.magic'
avro.schema: Level 5: Register new name for 'org.apache.avro.file.sync'
avro.schema: Level 5: Register new name for 'example.avro.User'
root: WARNING: snappy is not installed; some tests will be skipped.

[jira] [Assigned] (BEAM-5785) Validates runner tests fail with: Cannot convert bytes value to JSON value

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5785:
-

Assignee: (was: Valentyn Tymofieiev)

> Validates runner tests fail with: Cannot convert bytes value to JSON value
> --
>
> Key: BEAM-5785
> URL: https://issues.apache.org/jira/browse/BEAM-5785
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mark Liu
>Priority: Major
>
> I run a random ValidatesRunner test on Python 3 with TestsDataflowRunner. The 
> test failed before submitting job to the service.
> More details about my env and test:
> Python version: 3.5.3
> Test: 
> apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
> Command:
> {code}
> python setup.py nosetests \
>   --tests 
> apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
>   \
>   --nocapture \
>   --test-pipeline-options=" \ 
> --runner=TestDataflowRunner \
> --project= \
> --staging_location= \
> --temp_location= \
> --output= \
>   
> 
> --sdk_location=.../beam/sdks/python/dist/apache-beam-2.9.0.dev0.tar.gz \
> --num_workers=1"
> {code}
> Here are the stacktrace from my console:
> {code}
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 284, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> else test_runner_api))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 374, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 176, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 444, in visit
> self._root_transform().visit(visitor, self, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 780, in visit
> part.visit(visitor, pipeline, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 780, in visit
> part.visit(visitor, pipeline, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 783, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 171, in visit_transform
> self.runner.run_transform(transform_node)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 214, in run_transform
> return m(transform_node)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 846, in run_Read
> source_dict)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 85, in add_property
> key=name, value=to_json_value(value, with_type=with_type)))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
>  line 104, in to_json_value
> key=k, value=to_json_value(v, with_type=with_type)))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
>  line 104, in to_json_value
> key=k, value=to_json_value(v, with_type=with_type)))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
>  line 124, in to_json_value
> raise TypeError('Cannot convert %s to a JSON value.' % repr(obj))
> TypeError: 

[jira] [Updated] (BEAM-5785) Validates runner tests fail with: Cannot convert bytes value to JSON value

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-5785:
--
Summary: Validates runner tests fail with: Cannot convert bytes value to 
JSON value  (was: A ValidatesRunner test failed on Python 3)

> Validates runner tests fail with: Cannot convert bytes value to JSON value
> --
>
> Key: BEAM-5785
> URL: https://issues.apache.org/jira/browse/BEAM-5785
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> I run a random ValidatesRunner test on Python 3 with TestsDataflowRunner. The 
> test failed before submitting job to the service.
> More details about my env and test:
> Python version: 3.5.3
> Test: 
> apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
> Command:
> {code}
> python setup.py nosetests \
>   --tests 
> apache_beam.transforms.ptransform_test:PTransformTest.test_multiple_empty_outputs
>   \
>   --nocapture \
>   --test-pipeline-options=" \ 
> --runner=TestDataflowRunner \
> --project= \
> --staging_location= \
> --temp_location= \
> --output= \
>   
> 
> --sdk_location=.../beam/sdks/python/dist/apache-beam-2.9.0.dev0.tar.gz \
> --num_workers=1"
> {code}
> Here are the stacktrace from my console:
> {code}
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 284, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> else test_runner_api))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 374, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 176, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 444, in visit
> self._root_transform().visit(visitor, self, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 780, in visit
> part.visit(visitor, pipeline, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 780, in visit
> part.visit(visitor, pipeline, visited)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/pipeline.py", 
> line 783, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 171, in visit_transform
> self.runner.run_transform(transform_node)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/runner.py",
>  line 214, in run_transform
> return m(transform_node)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 846, in run_Read
> source_dict)
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 85, in add_property
> key=name, value=to_json_value(value, with_type=with_type)))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
>  line 104, in to_json_value
> key=k, value=to_json_value(v, with_type=with_type)))
>   File 
> "/usr/local/google/home/markliu/beam/sdks/python/apache_beam/internal/gcp/json_value.py",
>  line 104, in to_json_value
> key=k, value=to_json_value(v, with_type=with_type)))
>   File 
> 

[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=155642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155642
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 17/Oct/18 23:18
Start Date: 17/Oct/18 23:18
Worklog Time Spent: 10m 
  Work Description: manuzhang commented on issue #6728: [BEAM-5315] 
Partially port IO; fixing well-documented errors
URL: https://github.com/apache/beam/pull/6728#issuecomment-430822181
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155642)
Time Spent: 4h 20m  (was: 4h 10m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5763) Re-organize IntelliJ docs into workflow tasks

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5763?focusedWorklogId=155640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155640
 ]

ASF GitHub Bot logged work on BEAM-5763:


Author: ASF GitHub Bot
Created on: 17/Oct/18 22:51
Start Date: 17/Oct/18 22:51
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6732: [BEAM-5763] Fix wiki 
link to IntelliJ
URL: https://github.com/apache/beam/pull/6732#issuecomment-430816618
 
 
   Run Website PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155640)
Time Spent: 40m  (was: 0.5h)

> Re-organize IntelliJ docs into workflow tasks
> -
>
> Key: BEAM-5763
> URL: https://issues.apache.org/jira/browse/BEAM-5763
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current documentation is not well organized. It mostly focuses on how to 
> get an initial setup working, but doesn't talk about common developer tasks 
> (building from scratch, testing a single module / unit test / integration 
> test, recovering from project corruption).
> I'd like to re-organize the documentation so to make it very prescriptive to 
> follow and easy to validate that it works.
> Current set of proposed "workflows" listed in this doc: 
> https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5784) Retire FindBugs static analysis in Beam

2018-10-17 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654314#comment-16654314
 ] 

Scott Wegner commented on BEAM-5784:


The first step would be to audit the set of checks we get from FindBugs and see 
if there is a comparable check in ErrorProne or otherwise.

We use the default set of FindBugs checks, with a set of exclusions defined in 
[findbugs-filter.xml|https://github.com/apache/beam/blob/cd2ad5ef56d70893cf477222fb76f604e064cbe9/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]

I'll do an initial audit over the checks we use.

> Retire FindBugs static analysis in Beam
> ---
>
> Key: BEAM-5784
> URL: https://issues.apache.org/jira/browse/BEAM-5784
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Scott Wegner
>Priority: Minor
>  Labels: findbugs
>
> This was [discussed previously on 
> dev@|https://lists.apache.org/thread.html/83afd25e55f53e38a7d0a75f3de7373e0ea325fbbaede2fe7c56e279@%3Cdev.beam.apache.org%3E]
>  but we never actually made progress on it.
> Now that we have ErrorProne enabled across the codebase, we may be able to 
> deprecate FindBugs. FindBugs has caused all sorts of build and licensing 
> complications; it would be nice to remove it from our build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5784) Retire FindBugs static analysis in Beam

2018-10-17 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-5784:
--

 Summary: Retire FindBugs static analysis in Beam
 Key: BEAM-5784
 URL: https://issues.apache.org/jira/browse/BEAM-5784
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Scott Wegner


This was [discussed previously on 
dev@|https://lists.apache.org/thread.html/83afd25e55f53e38a7d0a75f3de7373e0ea325fbbaede2fe7c56e279@%3Cdev.beam.apache.org%3E]
 but we never actually made progress on it.

Now that we have ErrorProne enabled across the codebase, we may be able to 
deprecate FindBugs. FindBugs has caused all sorts of build and licensing 
complications; it would be nice to remove it from our build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5783) [beam_PreCommit_Website_Commit] [buildDockerImage] grpc: the connection is unavailable

2018-10-17 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654262#comment-16654262
 ] 

Scott Wegner commented on BEAM-5783:


This just failed again: 
https://builds.apache.org/job/beam_PreCommit_Website_Commit/342/

> [beam_PreCommit_Website_Commit] [buildDockerImage] grpc: the connection is 
> unavailable
> --
>
> Key: BEAM-5783
> URL: https://issues.apache.org/jira/browse/BEAM-5783
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/339/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/f66kbflbvgbhs/console-log?task=:beam-website:buildDockerImage#L11]
>  * [Test source 
> code|https://github.com/apache/beam/blob/45612dff7539ec8d026f13d80e233ae9086f46b6/website/Dockerfile#L25]
> Initial investigation:
> The failure occurred running {{gem install bundler}} during the Docker 
> container build:
> {{grpc: the connection is unavailable}}
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5783) [beam_PreCommit_Website_Commit] [buildDockerImage] grpc: the connection is unavailable

2018-10-17 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-5783:
---
Labels:   (was: currently-failing)

> [beam_PreCommit_Website_Commit] [buildDockerImage] grpc: the connection is 
> unavailable
> --
>
> Key: BEAM-5783
> URL: https://issues.apache.org/jira/browse/BEAM-5783
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/339/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/f66kbflbvgbhs/console-log?task=:beam-website:buildDockerImage#L11]
>  * [Test source 
> code|https://github.com/apache/beam/blob/45612dff7539ec8d026f13d80e233ae9086f46b6/website/Dockerfile#L25]
> Initial investigation:
> The failure occurred running {{gem install bundler}} during the Docker 
> container build:
> {{grpc: the connection is unavailable}}
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5760) Portable Flink support for maxBundleSize/maxBundleMillis

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5760?focusedWorklogId=155599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155599
 ]

ASF GitHub Bot logged work on BEAM-5760:


Author: ASF GitHub Bot
Created on: 17/Oct/18 20:46
Start Date: 17/Oct/18 20:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6723: [WIP] 
[BEAM-5760] Add support for multi-element bundles to portable Flink runner.
URL: https://github.com/apache/beam/pull/6723#discussion_r226012199
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -159,49 +160,16 @@ private StateRequestHandler 
getStateRequestHandler(ExecutableStage executableSta
 }
   }
 
-  // TODO: currently assumes that every element is a separate bundle,
-  // but this can be changed by pushing some of this logic into the 
"DoFnRunner"
-  private void processElementWithSdkHarness(WindowedValue element) 
throws Exception {
-checkState(
-stageBundleFactory != null, "%s not yet prepared", 
StageBundleFactory.class.getName());
-checkState(
-stateRequestHandler != null, "%s not yet prepared", 
StateRequestHandler.class.getName());
-
-OutputReceiverFactory receiverFactory =
-new OutputReceiverFactory() {
-  @Override
-  public FnDataReceiver create(String pCollectionId) {
-return (receivedElement) -> {
-  // handover to queue, do not block the grpc thread
-  outputQueue.put(KV.of(pCollectionId, receivedElement));
-};
-  }
-};
-
-try (RemoteBundle bundle =
-stageBundleFactory.getBundle(receiverFactory, stateRequestHandler, 
progressHandler)) {
-  LOG.debug(String.format("Sending value: %s", element));
-  // TODO(BEAM-4681): Add support to Flink to support portable timers.
-  
Iterables.getOnlyElement(bundle.getInputReceivers().values()).accept(element);
-  // TODO: it would be nice to emit results as they arrive, can thread 
wait non-blocking?
-}
-
-// RemoteBundle close blocks until all results are received
-KV result;
-while ((result = outputQueue.poll()) != null) {
-  outputManager.output(outputMap.get(result.getKey()), (WindowedValue) 
result.getValue());
-}
-  }
-
   @Override
-  public void close() throws Exception {
+  public void dispose() throws Exception {
 
 Review comment:
   Why do we use `dispose` instead of `close`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155599)
Time Spent: 40m  (was: 0.5h)

> Portable Flink support for maxBundleSize/maxBundleMillis
> 
>
> Key: BEAM-5760
> URL: https://issues.apache.org/jira/browse/BEAM-5760
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The portable runner needs to support larger bundles in streaming mode. 
> Currently every element is a separate bundle, which is very inefficient due 
> to the per bundle SDK worker overhead. The old Java SDK runner already 
> supports these parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155595
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 20:36
Start Date: 17/Oct/18 20:36
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #6711: [BEAM-5240] Add Jira 
data to Beam post-commits dashboard
URL: https://github.com/apache/beam/pull/6711
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.test-infra/metrics/README.md b/.test-infra/metrics/README.md
index dca8d17d02c..d18c52c4c35 100644
--- a/.test-infra/metrics/README.md
+++ b/.test-infra/metrics/README.md
@@ -102,13 +102,22 @@ docker push gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
 ## Kubernetes update
 https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
 
-1. Build and publish sync containers
 ```sh
+# Build and publish sync containers
 cd sync/jenkins
 docker build -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1 .
 docker push -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
+
+# If needed check current pod status
+kubectl get pods
+kubectl describe pod 
+
+# Update container image via one of the following.
+## update image for container 
+kubectl set image deployment/beamgrafana container=
+## or update deployemnt from yaml file 
+kubectl replace -f beamgrafana-deploy.yaml
 ```
-1. Update image for container `kubectl set image deployment/beamgrafana 
container=`
 
 
 ## Useful Kubernetes commands and hints
diff --git a/.test-infra/metrics/beamgrafana-deploy.yaml 
b/.test-infra/metrics/beamgrafana-deploy.yaml
index 62379510371..34941284172 100644
--- a/.test-infra/metrics/beamgrafana-deploy.yaml
+++ b/.test-infra/metrics/beamgrafana-deploy.yaml
@@ -33,36 +33,6 @@ spec:
   securityContext:
 fsGroup: 1000
   containers:
-  - name: beammetricssyncjenkins
-image: gcr.io/apache-beam-testing/beammetricssyncjenkins:v15
-env:
-  - name: JENSYNC_HOST
-value: 127.0.0.1
-  - name: JENSYNC_PORT
-value: "5432"
-  - name: JENSYNC_DBNAME
-value: beammetrics
-  - name: JENSYNC_DBUSERNAME
-valueFrom:
-  secretKeyRef:
-name: beammetrics-psql-db-credentials
-key: username
-  - name: JENSYNC_DBPWD
-valueFrom:
-  secretKeyRef:
-name: beammetrics-psql-db-credentials
-key: password
-  - name: cloudsql-proxy
-image: gcr.io/cloudsql-docker/gce-proxy:1.11
-command: ["/cloud_sql_proxy",
-  
"-instances=apache-beam-testing:us-west2:beammetrics=tcp:5432"]
-env:
-  - name: GOOGLE_APPLICATION_CREDENTIALS
-value: /secrets/cloudsql/config.json
-volumeMounts:
-  - name: beammetrics-psql-credentials
-mountPath: /secrets/cloudsql
-readOnly: true
   - name: beamgrafana
 image: grafana/grafana
 securityContext:
@@ -99,6 +69,55 @@ spec:
   name: beam-grafana-etcdata
 - mountPath: /var/log/grafana
   name: beam-grafana-logdata
+  - name: cloudsql-proxy
+image: gcr.io/cloudsql-docker/gce-proxy:1.11
+command: ["/cloud_sql_proxy",
+  
"-instances=apache-beam-testing:us-west2:beammetrics=tcp:5432"]
+env:
+  - name: GOOGLE_APPLICATION_CREDENTIALS
+value: /secrets/cloudsql/config.json
+volumeMounts:
+  - name: beammetrics-psql-credentials
+mountPath: /secrets/cloudsql
+readOnly: true
+  - name: beammetricssyncjenkins
+image: gcr.io/apache-beam-testing/beammetricssyncjenkins:v20181016
+env:
+  - name: JENSYNC_HOST
+value: 127.0.0.1
+  - name: JENSYNC_PORT
+value: "5432"
+  - name: JENSYNC_DBNAME
+value: beammetrics
+  - name: JENSYNC_DBUSERNAME
+valueFrom:
+  secretKeyRef:
+name: beammetrics-psql-db-credentials
+key: username
+  - name: JENSYNC_DBPWD
+valueFrom:
+  secretKeyRef:
+name: beammetrics-psql-db-credentials
+key: password
+  - name: beammetricssyncjira
+image: gcr.io/apache-beam-testing/beammetricssyncjira:v20181016
+env:
+  - name: DB_HOST
+value: 127.0.0.1
+  - name: DB_PORT
+value: "5432"
+  - name: DB_DBNAME
+value: beammetrics
+  - name: DB_DBUSERNAME

[jira] [Commented] (BEAM-5057) beam_Release_Gradle_NightlySnapshot failing due to a Javadoc error

2018-10-17 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654107#comment-16654107
 ] 

Chamikara Jayalath commented on BEAM-5057:
--

Yes.

> beam_Release_Gradle_NightlySnapshot failing due to a Javadoc error
> --
>
> Key: BEAM-5057
> URL: https://issues.apache.org/jira/browse/BEAM-5057
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/127/console]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/125/console]
>  
> * What went wrong:
> Execution failed for task ':beam-sdks-java-core:javadoc'.
> > Javadoc generation failed. Generated Javadoc options file (useful for 
> > troubleshooting): 
> > '/home/jenkins/jenkins-slave/workspace/beam_Release_Gradle_NightlySnapshot/src/sdks/java/core/build/tmp/javadoc/javadoc.options'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155574
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 19:31
Start Date: 17/Oct/18 19:31
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6694: [BEAM-5730] Migrate 
ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#issuecomment-430758890
 
 
   R: @lukecwik , latest commits address comments above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155574)
Time Spent: 2.5h  (was: 2h 20m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155573
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 19:29
Start Date: 17/Oct/18 19:29
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6694: [BEAM-5730] Migrate 
ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#issuecomment-430758047
 
 
   Run Java PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155573)
Time Spent: 2h 20m  (was: 2h 10m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5782) BigQuery TableRows not cloneable when using Dataflow

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5782?focusedWorklogId=155567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155567
 ]

ASF GitHub Bot logged work on BEAM-5782:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:54
Start Date: 17/Oct/18 18:54
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6729: [BEAM-5782] Make 
TableRows cloneable when read from Avro files.
URL: https://github.com/apache/beam/pull/6729#issuecomment-430746918
 
 
   R: @chamikaramj 
   R: @reuvenlax 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155567)
Time Spent: 20m  (was: 10m)

> BigQuery TableRows not cloneable when using Dataflow
> 
>
> Key: BEAM-5782
> URL: https://issues.apache.org/jira/browse/BEAM-5782
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TableRows are expected to be cloneable.
> BigQueryAvroUtils converts repeated records using an ImmutableList which is 
> not cloneable.
>  
> Reproduction steps:
> 1. Clone the code 
> [https://github.com/nahuellofeudo/row-clone-poc.git|https://www.google.com/url?q=https://github.com/nahuellofeudo/row-clone-poc.git=D=AFQjCNGkT0bYzhAoozGTQ4vsizxtphxj-g]
> 2. Run on Dataflow:
> mvn clean compile exec:java 
> -Dexec.mainClass=org.apache.beam.examples.RowClonePoc \
>       -Dexec.args="--runner=DataflowRunner 
> --gcpTempLocation=gs:// \
>       --tempLocation=gs:// --project=" 
> -Pdataflow-runner
> 3. Run locally:
> mvn clean compile exec:java 
> -Dexec.mainClass=org.apache.beam.examples.RowClonePoc \
>       -Dexec.args="--tempLocation=gs:// 
> --project="
> 4. Job execution on step 2 will fail [1].
> 5. Job execution on step 3 will success [2].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5782) BigQuery TableRows not cloneable when using Dataflow

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5782?focusedWorklogId=155566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155566
 ]

ASF GitHub Bot logged work on BEAM-5782:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:53
Start Date: 17/Oct/18 18:53
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #6729: [BEAM-5782] 
Make TableRows cloneable when read from Avro files.
URL: https://github.com/apache/beam/pull/6729
 
 
   This allows for TableRows to be 'cloned' which is expected by users to be 
supported based upon the TableRow javadoc.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155566)
Time Spent: 10m
Remaining Estimate: 0h

> BigQuery TableRows not cloneable when using Dataflow
> 
>
> Key: BEAM-5782
> URL: https://issues.apache.org/jira/browse/BEAM-5782
> Project: Beam
>  

[jira] [Work logged] (BEAM-5745) Util test on annotations fails

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5745?focusedWorklogId=10=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-10
 ]

ASF GitHub Bot logged work on BEAM-5745:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:38
Start Date: 17/Oct/18 18:38
Worklog Time Spent: 10m 
  Work Description: HuangLED edited a comment on issue #6687: [BEAM-5745] 
Fix annotation test for py3
URL: https://github.com/apache/beam/pull/6687#issuecomment-430734886
 
 
   > It seems odd to set this globally once a function is called (and not in a 
context).
   
   An alternative way, for the sake of fixing this failing test, is to 
explicitly call warnings.simplefilter() in each test case. This Option#2 of fix 
works as well.  The downside though, is that the discrepency between py2 and 
py3 would still exist. 
   
   My hypothesis on the root cause, is that python2 and python3 treat unscoped 
global function call differently. (funny thing is, this warning.simplefilter is 
called in py3,  there is something else under the hood for which I don't know 
how) Didn't find a definitive documentation on this though, please fill in if 
you are aware of good discussions on this. (+python experts @pabloem  
@tvalentyn )
   
   Weighing these two options, I slightly prefer option#1 in this PR because 
with this fix, this module by itself behave exactly the same behavior 
regardless in an py2 or py3 environment. And I don't think there is performance 
concern.  The downside here is the code does not look elegant enough. 
   
   to wrap it up: 
   Option#1:  Pro: this annotations.py module behaves the same in py2 and py3 ; 
Cons:  Code does not look elegant. 
   Option#2:  Pro: only need to change test file.   Cons:  We will need to 
handle the py2/py3 discrepency in all related test cases, and will need to do 
that in the future non-testing module if anything depend on this 
annotations.py. 
   
   Other alternative, feel free to recommend. I'd be happy to try out.   
   
   Or, if our consensus is to go with Option#2, I shall update this PR 
accordingly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 10)
Time Spent: 1h  (was: 50m)

> Util test on annotations fails 
> ---
>
> Key: BEAM-5745
> URL: https://issues.apache.org/jira/browse/BEAM-5745
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 142, in test_frequency
>     label_check_list=[])
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 149, in check_annotation
>     self.assertIn(fnc_name + ' is ' + annotation_type, 
> str(warning[-1].message))
> AssertionError: 'fnc2_test_annotate_frequency is experimental' not found in 
> 'fnc_test_annotate_frequency is experimental.'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155544
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:26
Start Date: 17/Oct/18 18:26
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6694: 
[BEAM-5730] Migrate ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#discussion_r226045475
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
 ##
 @@ -291,8 +292,10 @@ public void testStandardQueryWithoutCustom() throws 
Exception {
   @Test
   public void testNewTypesQueryWithoutReshuffleWithCustom() throws Exception {
 this.setupNewTypesQueryTest();
-this.options.setExperiments(
-ImmutableList.of("enable_custom_bigquery_sink", 
"enable_custom_bigquery_source"));
+List experiments = new ArrayList<>();
+experiments.add("enable_custom_bigquery_sink");
 
 Review comment:
   Agree, let me fix it. Thanks for pointing out!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155544)
Time Spent: 2h 10m  (was: 2h)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155543
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:20
Start Date: 17/Oct/18 18:20
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6694: [BEAM-5730] Migrate 
ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#issuecomment-430735086
 
 
   This PR is ready to merge if there is no more concerns. @lukecwik 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155543)
Time Spent: 2h  (was: 1h 50m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5745) Util test on annotations fails

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5745?focusedWorklogId=155542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155542
 ]

ASF GitHub Bot logged work on BEAM-5745:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:20
Start Date: 17/Oct/18 18:20
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6687: [BEAM-5745] Fix 
annotation test for py3
URL: https://github.com/apache/beam/pull/6687#issuecomment-430734886
 
 
   > It seems odd to set this globally once a function is called (and not in a 
context).
   
   An alternative way, for the sake of fixing this failing test, is to 
explicitly call warnings.simplefilter() in each test case. This Option#2 of fix 
works as well.  The downside though, is that the discrepency between py2 and 
py3 would still exist. 
   
   My hypothesis on the root cause, is that python2 and python3 treat unscoped 
global function call differently. (funny thing is, this warning.simplefilter is 
called in py3,  there is something else under the hood for which I don't know 
how) Didn't find a definitive documentation on this though, please fill in if 
you are aware of good discussions on this. (+python experts @pabloem  
@tvalentyn )
   
   Weighing these two options, I prefer option#1 in this PR because with this 
fix, this module by itself behave exactly the same behavior regardless in an 
py2 or py3 environment. And I don't think there is performance concern.  The 
downside here is the code does not look elegant enough. 
   
   to wrap it up: 
   Option#1:  Pro: this annotations.py module behaves the same in py2 and py3 ; 
Cons:  Code does not look elegant. 
   Option#2:  Pro: only need to change test file.   Cons:  We will need to 
handle the py2/py3 discrepency in all related test cases, and will need to do 
that in the future non-testing module if anything depend on this 
annotations.py. 
   
   Please make recommendations if you know an Option#3.  Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155542)
Time Spent: 50m  (was: 40m)

> Util test on annotations fails 
> ---
>
> Key: BEAM-5745
> URL: https://issues.apache.org/jira/browse/BEAM-5745
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 142, in test_frequency
>     label_check_list=[])
>   File 
> "/usr/local/google/home/ruoyun/projects/beam/sdks/python/apache_beam/utils/annotations_test.py",
>  line 149, in check_annotation
>     self.assertIn(fnc_name + ' is ' + annotation_type, 
> str(warning[-1].message))
> AssertionError: 'fnc2_test_annotate_frequency is experimental' not found in 
> 'fnc_test_annotate_frequency is experimental.'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155540
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:19
Start Date: 17/Oct/18 18:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6694: 
[BEAM-5730] Migrate ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#discussion_r226042841
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
 ##
 @@ -291,8 +292,10 @@ public void testStandardQueryWithoutCustom() throws 
Exception {
   @Test
   public void testNewTypesQueryWithoutReshuffleWithCustom() throws Exception {
 this.setupNewTypesQueryTest();
-this.options.setExperiments(
-ImmutableList.of("enable_custom_bigquery_sink", 
"enable_custom_bigquery_source"));
+List experiments = new ArrayList<>();
+experiments.add("enable_custom_bigquery_sink");
 
 Review comment:
   Wouldn't it make sense to fix the underlying bug to correctly concatenate 
two lists together instead of assuming the underlying list is mutable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155540)
Time Spent: 1h 40m  (was: 1.5h)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155541
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:19
Start Date: 17/Oct/18 18:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6694: 
[BEAM-5730] Migrate ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#discussion_r226043014
 
 

 ##
 File path: runners/google-cloud-dataflow-java/build.gradle
 ##
 @@ -104,12 +105,18 @@ test {
 
 task validatesRunnerTest(type: Test) {
   group = "Verification"
+  dependsOn ":beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar"
   def dataflowProject = project.findProperty('dataflowProject') ?: 
'apache-beam-testing'
   def dataflowTempRoot = project.findProperty('dataflowTempRoot') ?: 
'gs://temp-storage-for-validates-runner-tests/'
+  def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: 
project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath
+
   systemProperty "beamTestPipelineOptions", JsonOutput.toJson([
   "--runner=TestDataflowRunner",
   "--project=${dataflowProject}",
   "--tempRoot=${dataflowTempRoot}",
+  "--dataflowWorkerJar=${dataflowWorkerJar}",
+  "--workerHarnessContainerImage=",
 
 Review comment:
   Seems worthwhile for a comment since all other properties make sense since 
we are setting an explicit value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155541)
Time Spent: 1h 50m  (was: 1h 40m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5780) fn-api-worker and legacy-worker should point to different dir

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5780?focusedWorklogId=155539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155539
 ]

ASF GitHub Bot logged work on BEAM-5780:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:17
Start Date: 17/Oct/18 18:17
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6715: [BEAM-5780] Make 
fn-api-worker and legacy-worker point to the different dir
URL: https://github.com/apache/beam/pull/6715#issuecomment-430733949
 
 
   Please double check my changes. Also please add a comment to the top of both 
worker build files talking about why we have two different directories for the 
worker.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155539)
Time Spent: 10m
Remaining Estimate: 0h

> fn-api-worker and legacy-worker should point to different dir
> -
>
> Key: BEAM-5780
> URL: https://issues.apache.org/jira/browse/BEAM-5780
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5776) Using methods in map is broken on Python 3

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653984#comment-16653984
 ] 

Valentyn Tymofieiev commented on BEAM-5776:
---

Thanks for reporting this.

> Using methods in map is broken on Python 3
> --
>
> Key: BEAM-5776
> URL: https://issues.apache.org/jira/browse/BEAM-5776
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>
> E.g. 
> {code:java}
> pcoll | beam.Map(str.upper){code}
> no longer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5776) Using methods in map is broken on Python 3

2018-10-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5776:
-

Assignee: Valentyn Tymofieiev  (was: Ahmet Altay)

> Using methods in map is broken on Python 3
> --
>
> Key: BEAM-5776
> URL: https://issues.apache.org/jira/browse/BEAM-5776
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> E.g. 
> {code:java}
> pcoll | beam.Map(str.upper){code}
> no longer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5772) GCP IO tests slow down general Beam PostCommits

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5772?focusedWorklogId=155535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155535
 ]

ASF GitHub Bot logged work on BEAM-5772:


Author: ASF GitHub Bot
Created on: 17/Oct/18 18:12
Start Date: 17/Oct/18 18:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6712: [BEAM-5772] Moving 
GCP IO tests to a new post commit suite
URL: https://github.com/apache/beam/pull/6712#issuecomment-430732206
 
 
   I'm goingon vacation, so I'll get back to this later on.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155535)
Time Spent: 50m  (was: 40m)

> GCP IO tests slow down general Beam PostCommits
> ---
>
> Key: BEAM-5772
> URL: https://issues.apache.org/jira/browse/BEAM-5772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155525
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:28
Start Date: 17/Oct/18 17:28
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6694: 
[BEAM-5730] Migrate ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#discussion_r226025275
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
 ##
 @@ -291,8 +292,10 @@ public void testStandardQueryWithoutCustom() throws 
Exception {
   @Test
   public void testNewTypesQueryWithoutReshuffleWithCustom() throws Exception {
 this.setupNewTypesQueryTest();
-this.options.setExperiments(
-ImmutableList.of("enable_custom_bigquery_sink", 
"enable_custom_bigquery_source"));
+List experiments = new ArrayList<>();
+experiments.add("enable_custom_bigquery_sink");
 
 Review comment:
   If there is dataflowWorkerJar set, then we will append the experiment flag 
into experiments: 
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L744.
 In this case, if the experiments from pipeline is a immutable collection, then 
when trying to append the worker_jar experiment, an exception will be thrown.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155525)
Time Spent: 1.5h  (was: 1h 20m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5772) GCP IO tests slow down general Beam PostCommits

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5772?focusedWorklogId=155524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155524
 ]

ASF GitHub Bot logged work on BEAM-5772:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:22
Start Date: 17/Oct/18 17:22
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6712: [BEAM-5772] Moving 
GCP IO tests to a new post commit suite
URL: https://github.com/apache/beam/pull/6712#issuecomment-430715206
 
 
   hmmm this is too fast, hah. I need to look into why that is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155524)
Time Spent: 40m  (was: 0.5h)

> GCP IO tests slow down general Beam PostCommits
> ---
>
> Key: BEAM-5772
> URL: https://issues.apache.org/jira/browse/BEAM-5772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5779) Failure in beam_PostCommit_Python_Verify PubSubIntegrationTest on Dataflow

2018-10-17 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5779:
--
Description: 
https://builds.apache.org/job/beam_PostCommit_Python_Verify/6296/

Path to failure:
* (Jenkins) beam_PostCommit_Python_Verify
* (gradle) :beam-sdks-python:postCommitITTests
* (py module) test_streaming_data_only
* (test class) apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest

{code}
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 
messages.)
 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [('data002-seen', 1), ('data001-seen', 1)]
  Unexpected: []
{code}

Looks like a legitimate failure. It is flakey as it went green later.

  was:
https://builds.apache.org/job/beam_PostCommit_Python_Verify/6296/

Path to failure:
* beam_PostCommit_Python_Verify
* test_streaming_data_only
* apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest

{code}
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 
messages.)
 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [('data002-seen', 1), ('data001-seen', 1)]
  Unexpected: []
{code}

Looks like a legitimate failure. It is flakey as it went green later.


> Failure in beam_PostCommit_Python_Verify PubSubIntegrationTest on Dataflow
> --
>
> Key: BEAM-5779
> URL: https://issues.apache.org/jira/browse/BEAM-5779
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: flake
>
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/6296/
> Path to failure:
> * (Jenkins) beam_PostCommit_Python_Verify
> * (gradle) :beam-sdks-python:postCommitITTests
> * (py module) test_streaming_data_only
> * (test class) 
> apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest
> {code}
> AssertionError: 
> Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 
> messages.)
>  but: Expected 2 messages. Got 0 messages. Diffs (item, count):
>   Expected but not in actual: [('data002-seen', 1), ('data001-seen', 1)]
>   Unexpected: []
> {code}
> Looks like a legitimate failure. It is flakey as it went green later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155520
 ]

ASF GitHub Bot logged work on BEAM-5730:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:17
Start Date: 17/Oct/18 17:17
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6694: 
[BEAM-5730] Migrate ITs using DataflowRunner to use custom worker
URL: https://github.com/apache/beam/pull/6694#discussion_r226021110
 
 

 ##
 File path: runners/google-cloud-dataflow-java/build.gradle
 ##
 @@ -104,12 +105,18 @@ test {
 
 task validatesRunnerTest(type: Test) {
   group = "Verification"
+  dependsOn ":beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar"
   def dataflowProject = project.findProperty('dataflowProject') ?: 
'apache-beam-testing'
   def dataflowTempRoot = project.findProperty('dataflowTempRoot') ?: 
'gs://temp-storage-for-validates-runner-tests/'
+  def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: 
project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath
+
   systemProperty "beamTestPipelineOptions", JsonOutput.toJson([
   "--runner=TestDataflowRunner",
   "--project=${dataflowProject}",
   "--tempRoot=${dataflowTempRoot}",
+  "--dataflowWorkerJar=${dataflowWorkerJar}",
+  "--workerHarnessContainerImage=",
 
 Review comment:
   This is only for the legacy worker and related to the Dataflow service. If 
the workerHarnessContainerImage isn't set to empty explicit, it will use the 
default one which contains the pre-build worker jar.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155520)
Time Spent: 1h 20m  (was: 1h 10m)

> Migrate Java test to use a staged worker jar
> 
>
> Key: BEAM-5730
> URL: https://issues.apache.org/jira/browse/BEAM-5730
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5779) Failure in beam_PostCommit_Python_Verify PubSubIntegrationTest on Dataflow

2018-10-17 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-5779:
-

 Summary: Failure in beam_PostCommit_Python_Verify 
PubSubIntegrationTest on Dataflow
 Key: BEAM-5779
 URL: https://issues.apache.org/jira/browse/BEAM-5779
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Kenneth Knowles
Assignee: Ahmet Altay


https://builds.apache.org/job/beam_PostCommit_Python_Verify/6296/

Path to failure:
* beam_PostCommit_Python_Verify
* test_streaming_data_only
* apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest

{code}
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 
messages.)
 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [('data002-seen', 1), ('data001-seen', 1)]
  Unexpected: []
{code}

Looks like a legitimate failure. It is flakey as it went green later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5774) beam_Release_Gradle_NightlySnapshot timed out

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5774?focusedWorklogId=155517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155517
 ]

ASF GitHub Bot logged work on BEAM-5774:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:11
Start Date: 17/Oct/18 17:11
Worklog Time Spent: 10m 
  Work Description: alanmyrvold opened a new pull request #6727: 
[BEAM-5774] Increase timeout of beam_Release_Gradle_NightlySnapshot
URL: https://github.com/apache/beam/pull/6727
 
 
   beam_Release_Gradle_NightlySnapshot has been timing out with the default 
timeout of 100 minutes.
   
   +R: @kennknowles PTAL
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155517)
Time Spent: 10m
Remaining Estimate: 0h

> beam_Release_Gradle_NightlySnapshot timed out
> -
>
> Key: BEAM-5774
> URL: https://issues.apache.org/jira/browse/BEAM-5774
> Project: Beam
>  

[jira] [Work logged] (BEAM-5309) Add streaming support for HadoopOutputFormatIO

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5309?focusedWorklogId=155513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155513
 ]

ASF GitHub Bot logged work on BEAM-5309:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:08
Start Date: 17/Oct/18 17:08
Worklog Time Spent: 10m 
  Work Description: dmvk commented on a change in pull request #6691: 
WIP:[BEAM-5309] Add streaming support for HadoopFormatIO
URL: https://github.com/apache/beam/pull/6691#discussion_r226018167
 
 

 ##
 File path: 
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerFn.java
 ##
 @@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to you under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+package org.apache.beam.sdk.io.hadoop.format;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.CoderRegistry;
+import org.apache.beam.sdk.coders.IterableCoder;
+import org.apache.beam.sdk.coders.ListCoder;
+import org.apache.beam.sdk.transforms.Combine;
+import org.apache.beam.sdk.values.TypeDescriptor;
+import org.apache.beam.sdk.values.TypeDescriptors;
+
+/**
+ * Collects all items of defined type into one {@link Iterable} container.
+ *
+ * @param  Type of the elements to collect
+ */
+class IterableCombinerFn
+extends Combine.AccumulatingCombineFn<
+T, IterableCombinerFn.CollectionAccumulator, Iterable> {
+
+  /**
+   * Accumulator for collecting one "shard" of types.
+   *
+   * @param  Type of the elements to collect
+   */
+  public static class CollectionAccumulator
+  implements Combine.AccumulatingCombineFn.Accumulator<
+  T, CollectionAccumulator, Iterable> {
+
+private final List collection;
+
+private CollectionAccumulator() {
+  this(new ArrayList<>());
+}
+
+private CollectionAccumulator(List collection) {
+  Objects.requireNonNull(collection, "Collection can't be null");
+  this.collection = collection;
+}
+
+@Override
+public void addInput(T input) {
+  collection.add(input);
 
 Review comment:
   I think not, this is used to combine all completed task ids. 1M tasks would 
be ~4MB. It is very unlikely to hit larger number of tasks than this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155513)
Time Spent: 3h 20m  (was: 3h 10m)

> Add streaming support for HadoopOutputFormatIO
> --
>
> Key: BEAM-5309
> URL: https://issues.apache.org/jira/browse/BEAM-5309
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: David Hrbacek
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> design doc: https://s.apache.org/beam-streaming-hofio



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155506
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 17:00
Start Date: 17/Oct/18 17:00
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #6711: [BEAM-5240] Add Jira 
data to Beam post-commits dashboard
URL: https://github.com/apache/beam/pull/6711#issuecomment-430708034
 
 
   That's a valid approach. I believe those steps were followed only twice: 
once me writing them, once by Huygaa while working on more metrics. Feel free 
to add comments.
   This branch allows committers to patch PR, feel free to do that as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155506)
Time Spent: 5h 40m  (was: 5.5h)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5772) GCP IO tests slow down general Beam PostCommits

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5772?focusedWorklogId=155505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155505
 ]

ASF GitHub Bot logged work on BEAM-5772:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:58
Start Date: 17/Oct/18 16:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6712: [BEAM-5772] Moving 
GCP IO tests to a new post commit suite
URL: https://github.com/apache/beam/pull/6712#issuecomment-430707448
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155505)
Time Spent: 20m  (was: 10m)

> GCP IO tests slow down general Beam PostCommits
> ---
>
> Key: BEAM-5772
> URL: https://issues.apache.org/jira/browse/BEAM-5772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2918) Flink support for portable user state

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2918?focusedWorklogId=155493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155493
 ]

ASF GitHub Bot logged work on BEAM-2918:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:42
Start Date: 17/Oct/18 16:42
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6726: [BEAM-2918] Add state 
support for streaming in portable FlinkRunner
URL: https://github.com/apache/beam/pull/6726#issuecomment-430701756
 
 
   Run Python Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155493)
Time Spent: 40m  (was: 0.5h)

> Flink support for portable user state
> -
>
> Key: BEAM-2918
> URL: https://issues.apache.org/jira/browse/BEAM-2918
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: portability
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2918) Flink support for portable user state

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2918?focusedWorklogId=155494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155494
 ]

ASF GitHub Bot logged work on BEAM-2918:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:42
Start Date: 17/Oct/18 16:42
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6726: [BEAM-2918] Add state 
support for streaming in portable FlinkRunner
URL: https://github.com/apache/beam/pull/6726#issuecomment-430701836
 
 
   Run Java Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155494)
Time Spent: 50m  (was: 40m)

> Flink support for portable user state
> -
>
> Key: BEAM-2918
> URL: https://issues.apache.org/jira/browse/BEAM-2918
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: portability
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2918) Flink support for portable user state

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2918?focusedWorklogId=155489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155489
 ]

ASF GitHub Bot logged work on BEAM-2918:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:40
Start Date: 17/Oct/18 16:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6726: [BEAM-2918] Add state 
support for streaming in portable FlinkRunner
URL: https://github.com/apache/beam/pull/6726#issuecomment-430701133
 
 
   Run Python Flink PortableValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155489)
Time Spent: 20m  (was: 10m)

> Flink support for portable user state
> -
>
> Key: BEAM-2918
> URL: https://issues.apache.org/jira/browse/BEAM-2918
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: portability
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2918) Flink support for portable user state

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2918?focusedWorklogId=155488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155488
 ]

ASF GitHub Bot logged work on BEAM-2918:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:38
Start Date: 17/Oct/18 16:38
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6726: [BEAM-2918] Add 
state support for streaming in portable FlinkRunner
URL: https://github.com/apache/beam/pull/6726
 
 
   This adds portable state support for the streaming mode.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155488)
Time Spent: 10m
Remaining Estimate: 0h

> Flink support for portable user state
> -
>
> Key: BEAM-2918
> URL: https://issues.apache.org/jira/browse/BEAM-2918
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: portability
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link

2018-10-17 Thread Melissa Pashniak (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653805#comment-16653805
 ] 

Melissa Pashniak commented on BEAM-5741:


Another thought, could add the same Finding help section at the very bottom of 
the guide also, for folks who worked their way down to the bottom and forgot 
about the help section at the top. Could rename the bottom section slightly if 
there's repetition concern. "Have questions?" or something.

 

 

 

 

 

> Move "Contact Us" to a top-level link
> -
>
> Key: BEAM-5741
> URL: https://issues.apache.org/jira/browse/BEAM-5741
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Priority: Major
>
> It should be very easy to figure out how to get in touch with community. 
> "Contact Us" should be a top-level link on the page.
> The page can also be improved with:
> * Some basic text on how to use subscribe / unsubscribe links
> * Recommendations on how to use various communications channels (Slack for 
> quick questions, dev@ for longer conversations. And all decisions should make 
> it back to dev@)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155480
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:30
Start Date: 17/Oct/18 16:30
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6711: [BEAM-5240] Add Jira 
data to Beam post-commits dashboard
URL: https://github.com/apache/beam/pull/6711#issuecomment-430697664
 
 
   Taking a look now. I'm not very familiar with the BeamMetrics implementation 
yet, so my plan for reviewing it will be to run through the README.md steps.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155480)
Time Spent: 5.5h  (was: 5h 20m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155477
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:21
Start Date: 17/Oct/18 16:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6701: 
[BEAM-5496] Fixes bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#discussion_r226002149
 
 

 ##
 File path: 
sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
 ##
 @@ -335,8 +339,26 @@ public void finalizeCheckpoint() {
 // set an empty list to messages when deserialize
 private void readObject(java.io.ObjectInputStream stream)
 throws IOException, ClassNotFoundException {
+  stream.defaultReadObject();
   messages = new ArrayList<>();
 }
+
+@Override
+public boolean equals(Object other) {
+  if (other instanceof MqttCheckpointMark){
+MqttCheckpointMark that = (MqttCheckpointMark)other;
+return this.clientId.equals(that.clientId) && 
this.oldestMessageTimestamp.equals(that.oldestMessageTimestamp);
 
 Review comment:
   Use Objects.equal since it is a null safe check and then you never need to 
worry if `clientId` or `oldestMessageTimestamp` are null.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155477)
Time Spent: 50m  (was: 40m)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155476
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:21
Start Date: 17/Oct/18 16:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6701: 
[BEAM-5496] Fixes bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#discussion_r226000998
 
 

 ##
 File path: 
sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
 ##
 @@ -335,8 +339,26 @@ public void finalizeCheckpoint() {
 // set an empty list to messages when deserialize
 private void readObject(java.io.ObjectInputStream stream)
 throws IOException, ClassNotFoundException {
+  stream.defaultReadObject();
   messages = new ArrayList<>();
 
 Review comment:
   Since we override `messages` when we read the object, we should make 
`messages` as a `transient` field.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155476)
Time Spent: 50m  (was: 40m)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155475
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:21
Start Date: 17/Oct/18 16:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6701: 
[BEAM-5496] Fixes bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#discussion_r226001316
 
 

 ##
 File path: 
sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
 ##
 @@ -335,8 +339,26 @@ public void finalizeCheckpoint() {
 // set an empty list to messages when deserialize
 private void readObject(java.io.ObjectInputStream stream)
 throws IOException, ClassNotFoundException {
+  stream.defaultReadObject();
   messages = new ArrayList<>();
 }
+
+@Override
+public boolean equals(Object other) {
+  if (other instanceof MqttCheckpointMark){
+MqttCheckpointMark that = (MqttCheckpointMark)other;
+return this.clientId.equals(that.clientId) && 
this.oldestMessageTimestamp.equals(that.oldestMessageTimestamp);
 
 Review comment:
   Should two checkpoints be equal even if they have different lists of 
`messages`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155475)
Time Spent: 40m  (was: 0.5h)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155478
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:21
Start Date: 17/Oct/18 16:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6701: 
[BEAM-5496] Fixes bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#discussion_r226001900
 
 

 ##
 File path: 
sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
 ##
 @@ -335,8 +339,26 @@ public void finalizeCheckpoint() {
 // set an empty list to messages when deserialize
 private void readObject(java.io.ObjectInputStream stream)
 throws IOException, ClassNotFoundException {
+  stream.defaultReadObject();
   messages = new ArrayList<>();
 }
+
+@Override
+public boolean equals(Object other) {
+  if (other instanceof MqttCheckpointMark){
 
 Review comment:
   nit: use guard style statements (simplifies code readability):
   ```
   if (!(other instanceof MqttCheckpointMark)) {
 return false;
   }
   
   MqttCheckpointMark that = (MqttCheckpointMark)other;
   return this.clientId.equals(that.clientId) && 
this.oldestMessageTimestamp.equals(that.oldestMessageTimestamp);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155478)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155473
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:13
Start Date: 17/Oct/18 16:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6701: [BEAM-5496] Fixes 
bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#issuecomment-430691942
 
 
   R: @lukecwik 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155473)
Time Spent: 0.5h  (was: 20m)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5496) MqttIO fails to deserialize checkpoint

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5496?focusedWorklogId=155472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155472
 ]

ASF GitHub Bot logged work on BEAM-5496:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:13
Start Date: 17/Oct/18 16:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6701: [BEAM-5496] Fixes 
bug of MqttIO fails to deserialize checkpoint
URL: https://github.com/apache/beam/pull/6701#issuecomment-430691883
 
 
   The jenkins run failed because you have some style issues with the code.
   Run 'gradlew spotlessApply' to fix these violations and update your PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155472)
Time Spent: 20m  (was: 10m)

> MqttIO fails to deserialize checkpoint
> --
>
> Key: BEAM-5496
> URL: https://issues.apache.org/jira/browse/BEAM-5496
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Luke Cwik
>Assignee: Island Chen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Source of bug report: 
> [https://lists.apache.org/thread.html/3de5a946bcb539dea9f18a31f712d6af5b66f9fbb6b01eed452c5afb@%3Cdev.beam.apache.org%3E]
>  
> There is a bug of the built-in MqttIO, please check the 
> ,
>  this readObject() method forget to invoke the "stream.defaultReadObject()" 
> method.
>  
> {code:java}
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
>   messages = new ArrayList<>();
> }{code}
>  
> So there is an exception while the runner tried to deserialize the checkpoint 
> object.
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: 95 
> unexpected extra bytes after decoding 
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5716) Move testing utilities to a common place

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5716?focusedWorklogId=155470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155470
 ]

ASF GitHub Bot logged work on BEAM-5716:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:07
Start Date: 17/Oct/18 16:07
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6725: 
[BEAM-5716] Reorganize testing modules
URL: https://github.com/apache/beam/pull/6725#discussion_r225996971
 
 

 ##
 File path: 
sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/MetricsReader.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.testutils.metrics;
+
+import java.util.NoSuchElementException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.joda.time.Duration;
+import org.slf4j.LoggerFactory;
+
+/** Provides methods for querying metrics from {@link PipelineResult} per 
namespace. */
+public class MetricsReader {
+
+  private static final org.slf4j.Logger LOG = 
LoggerFactory.getLogger(MetricsReader.class);
+
+  private enum DistributionType {
+MIN,
+MAX
+  }
+
+  private final PipelineResult result;
+
+  private final String namespace;
+
+  public MetricsReader(PipelineResult result, String namespace) {
+this.result = result;
+this.namespace = namespace;
+  }
+
+  /**
+   * Return the current value for a long counter, or a default value if can't 
be retrieved. Note
+   * this uses only attempted metrics because some runners don't support 
committed metrics.
+   */
+  public long getCounterMetric(String name, long defaultValue) {
+MetricQueryResults metrics =
+result
+.metrics()
+.queryMetrics(
+MetricsFilter.builder()
+.addNameFilter(MetricNameFilter.named(namespace, name))
+.build());
+Iterable> counters = metrics.getCounters();
+try {
+  MetricResult metricResult = counters.iterator().next();
+  return metricResult.getAttempted();
+} catch (NoSuchElementException e) {
+  LOG.error("Failed to get metric {}, from namespace {}", name, namespace);
+}
+return defaultValue;
+  }
+
+  /**
+   * Return start time metric by counting the difference between "now" and min 
value from a
+   * distribution metric.
+   */
+  public long getStartTimeMetric(long now, String name) {
+return this.getTimestampMetric(now, this.getDistributionMetric(name, 
DistributionType.MIN, -1));
+  }
+
+  /**
+   * Return end time metric by counting the difference between "now" and MAX 
value from a
+   * distribution metric.
+   */
+  public long getEndTimeMetric(long now, String name) {
+return this.getTimestampMetric(now, this.getDistributionMetric(name, 
DistributionType.MAX, -1));
+  }
+
+  /**
+   * Return the current value for a long counter, or a default value if can't 
be retrieved. Note
+   * this uses only attempted metrics because some runners don't support 
committed metrics.
+   */
+  private long getDistributionMetric(String name, DistributionType distType, 
long defaultValue) {
 
 Review comment:
   This default value is allways -1. Maybye I should remove this parameter and 
always set -1 in the method body?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155470)
Time Spent: 20m  (was: 10m)

> Move testing utilities to a common place
> 
>
> Key: BEAM-5716
> URL: 

[jira] [Work logged] (BEAM-5716) Move testing utilities to a common place

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5716?focusedWorklogId=155471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155471
 ]

ASF GitHub Bot logged work on BEAM-5716:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:07
Start Date: 17/Oct/18 16:07
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6725: 
[BEAM-5716] Reorganize testing modules
URL: https://github.com/apache/beam/pull/6725#discussion_r225996710
 
 

 ##
 File path: 
sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/MetricsReader.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.testutils.metrics;
+
+import java.util.NoSuchElementException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.joda.time.Duration;
+import org.slf4j.LoggerFactory;
+
+/** Provides methods for querying metrics from {@link PipelineResult} per 
namespace. */
+public class MetricsReader {
+
+  private static final org.slf4j.Logger LOG = 
LoggerFactory.getLogger(MetricsReader.class);
+
+  private enum DistributionType {
+MIN,
+MAX
+  }
+
+  private final PipelineResult result;
+
+  private final String namespace;
+
+  public MetricsReader(PipelineResult result, String namespace) {
+this.result = result;
+this.namespace = namespace;
+  }
+
+  /**
+   * Return the current value for a long counter, or a default value if can't 
be retrieved. Note
+   * this uses only attempted metrics because some runners don't support 
committed metrics.
+   */
+  public long getCounterMetric(String name, long defaultValue) {
+MetricQueryResults metrics =
+result
+.metrics()
+.queryMetrics(
+MetricsFilter.builder()
+.addNameFilter(MetricNameFilter.named(namespace, name))
+.build());
+Iterable> counters = metrics.getCounters();
+try {
+  MetricResult metricResult = counters.iterator().next();
+  return metricResult.getAttempted();
+} catch (NoSuchElementException e) {
+  LOG.error("Failed to get metric {}, from namespace {}", name, namespace);
+}
+return defaultValue;
+  }
+
+  /**
+   * Return start time metric by counting the difference between "now" and min 
value from a
+   * distribution metric.
+   */
+  public long getStartTimeMetric(long now, String name) {
+return this.getTimestampMetric(now, this.getDistributionMetric(name, 
DistributionType.MIN, -1));
+  }
+
+  /**
+   * Return end time metric by counting the difference between "now" and MAX 
value from a
+   * distribution metric.
+   */
+  public long getEndTimeMetric(long now, String name) {
+return this.getTimestampMetric(now, this.getDistributionMetric(name, 
DistributionType.MAX, -1));
+  }
+
+  /**
+   * Return the current value for a long counter, or a default value if can't 
be retrieved. Note
+   * this uses only attempted metrics because some runners don't support 
committed metrics.
+   */
+  private long getDistributionMetric(String name, DistributionType distType, 
long defaultValue) {
+MetricQueryResults metrics =
+result
+.metrics()
+.queryMetrics(
+MetricsFilter.builder()
+.addNameFilter(MetricNameFilter.named(namespace, name))
+.build());
+Iterable> distributions = 
metrics.getDistributions();
+try {
+  MetricResult distributionResult = 
distributions.iterator().next();
+  switch (distType) {
+case MIN:
+  return distributionResult.getAttempted().getMin();
+case MAX:
+  return distributionResult.getAttempted().getMax();
+default:
+  return defaultValue;
+  }
+} catch (NoSuchElementException e) {
+  

[jira] [Work logged] (BEAM-5716) Move testing utilities to a common place

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5716?focusedWorklogId=155469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155469
 ]

ASF GitHub Bot logged work on BEAM-5716:


Author: ASF GitHub Bot
Created on: 17/Oct/18 16:01
Start Date: 17/Oct/18 16:01
Worklog Time Spent: 10m 
  Work Description: lgajowy opened a new pull request #6725: [BEAM-5716] 
Reorganize testing modules
URL: https://github.com/apache/beam/pull/6725
 
 
   In this pr load-tests and Nexmark suites were gathered together in a 
"testing" directory. Special "test-utils" module was divided to store common 
code for those two (and possibly other tests, such as IOITs). 
   
   For now, only MetricsReader class was extracted to the common module to 
diminish reviewer - contributor feedback loop. In the future, most of the 
metrics related and publishing metrics related code (eg. to BigQuery) could be 
moved to the common module. Nexmark is the most advanced in those areas so 
other tests should benefit from that. 
   
   I added an extra commit that uses the common metrics class in load tests. 
let me know if this should be moved from this pr. 
   
     Friendly hint - it might be easier to review commit by commit.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated 

[jira] [Assigned] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function

2018-10-17 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5615:


Assignee: (was: Juta Staes)

> Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword 
> argument for this function
> -
>
> Key: BEAM-5615
> URL: https://issues.apache.org/jira/browse/BEAM-5615
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py",
>  line 89, in test_top
> names)  # Note parameter passed to comparator.
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py",
>  line 111, in __or__
> return self.pipeline.apply(ptransform, self)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 467, in apply
> label or transform.label)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 477, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 513, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 193, in apply
> return m(transform, input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 199, in apply_PTransform
> return transform.expand(input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 759, in expand
> return self._fn(pcoll, *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py",
>  line 185, in Of
> TopCombineFn(n, compare, key, reverse), *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py",
>  line 111, in __or__
> return self.pipeline.apply(ptransform, self)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py",
>  line 513, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 193, in apply
> return m(transform, input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py",
>  line 199, in apply_PTransform
> return transform.expand(input)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1251, in expand
> default_value = combine_fn.apply([], *self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 623, in apply
> *args, **kwargs)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py",
>  line 362, in extract_output
> self._sort_buffer(buffer, lt)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py",
>  line 295, in _sort_buffer
> key=self._key_fn)
> TypeError: 'cmp' is an invalid keyword argument for this function



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-17 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5621:


Assignee: Juta Staes

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-17 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5621:


Assignee: (was: Juta Staes)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5618) Several tests fail on Python 3 with: unsupported operand type(s) for +: 'int' and 'EmptySideInput'

2018-10-17 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5618:


Assignee: (was: Matthias Feys)

> Several tests fail on Python 3 with: unsupported operand type(s) for +: 'int' 
> and 'EmptySideInput'
> --
>
> Key: BEAM-5618
> URL: https://issues.apache.org/jira/browse/BEAM-5618
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> ERROR: test_do_with_side_input_as_arg 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 529, in invoke_process
> windowed_value, additional_args, additional_kwargs, output_processor)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 598, in _invoke_per_window
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 135, in 
> lambda x, addon: [x + addon], pvalue.AsSingleton(side))
> TypeError: unsupported operand type(s) for +: 'int' and 'EmptySideInput'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner

2018-10-17 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5744:


Assignee: (was: Juta Staes)

> Investigate negative numbers represented as 'long' in Python SDK + Direct 
> runner
> 
>
> Key: BEAM-5744
> URL: https://issues.apache.org/jira/browse/BEAM-5744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner

2018-10-17 Thread Juta Staes (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653761#comment-16653761
 ] 

Juta Staes commented on BEAM-5744:
--

[~tvalentyn] I wanted to take a look into this issue but I am unable to 
reproduce it so I am afraid I can't help.

> Investigate negative numbers represented as 'long' in Python SDK + Direct 
> runner
> 
>
> Key: BEAM-5744
> URL: https://issues.apache.org/jira/browse/BEAM-5744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5760) Portable Flink support for maxBundleSize/maxBundleMillis

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5760?focusedWorklogId=155458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155458
 ]

ASF GitHub Bot logged work on BEAM-5760:


Author: ASF GitHub Bot
Created on: 17/Oct/18 15:25
Start Date: 17/Oct/18 15:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6723: [WIP] [BEAM-5760] 
Add support for multi-element bundles to portable Flink runner.
URL: https://github.com/apache/beam/pull/6723#issuecomment-430673927
 
 
   Thanks, I'll take a look at this. It should help a lot (as getting the
   per-bundle overhead comparable to the time spent plumbing a single element
   through would be difficult; just constructing and parsing the various
   messages and channels would take time).
   
   Is there a good, brief summary as to why we had the one-element-per-bundle
   setup in the first place? I have in the back of my head it was something to
   do with how checkpoints and/or watermarks are controlled in Flink, but
   could be wrong.
   
   On Wed, Oct 17, 2018 at 5:02 PM Thomas Weise 
   wrote:
   
   > @tweise  requested your review on: #6723
   >  [WIP] [BEAM-5760] Add support
   > for multi-element bundles to portable Flink runner..
   >
   > —
   > You are receiving this because your review was requested.
   > Reply to this email directly, view it on GitHub
   > , or mute the
   > thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155458)
Time Spent: 20m  (was: 10m)

> Portable Flink support for maxBundleSize/maxBundleMillis
> 
>
> Key: BEAM-5760
> URL: https://issues.apache.org/jira/browse/BEAM-5760
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The portable runner needs to support larger bundles in streaming mode. 
> Currently every element is a separate bundle, which is very inefficient due 
> to the per bundle SDK worker overhead. The old Java SDK runner already 
> supports these parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4042) Get rid of deprecated gradle API

2018-10-17 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653724#comment-16653724
 ] 

Scott Wegner commented on BEAM-4042:


>From the error message, it appears this deprecated usage comes from the 
>[com.github.johnrengelman.shadow 
>plugin|https://github.com/johnrengelman/shadow] used for shading.

When this bug was opened we were on version {{2.0.1}}, and have [since 
upgraded|https://github.com/apache/beam/pull/5401] to {{2.0.4}}, and the 
[release notes|https://github.com/johnrengelman/shadow/releases] mention fixing 
many deprecation warnings.

[~romain.manni-bucau], can you confirm if you're still seeing this error? There 
seems to be a newer {{4.0.1}} release we could try, but if {{2.0.4}} has 
already fixed it we can close this.

> Get rid of deprecated gradle API
> 
>
> Key: BEAM-4042
> URL: https://issues.apache.org/jira/browse/BEAM-4042
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Priority: Minor
>
> {code}
> > Task :beam-model-pipeline:shadowJar
> The SimpleWorkResult type has been deprecated and is scheduled to be removed 
> in Gradle 5.0. Please use WorkResults.didWork() instead.
> at 
> org.gradle.api.internal.tasks.SimpleWorkResult.(SimpleWorkResult.java:34)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
> at 
> org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
> at 
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
> at 
> org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
> at 
> org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
> at 
> com.github.jengelman.gradle.plugins.shadow.tasks.ShadowCopyAction.execute(ShadowCopyAction.groovy:99)
> {code}
> to ensure the build output is as expected as possible (no exception in the 
> build process when "green") this kind of stack should be fixed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5778) Add integrations of Metrics API to Big Query for SyntheticSources load tests in Python SDK

2018-10-17 Thread Kasia Kucharczyk (JIRA)
Kasia Kucharczyk created BEAM-5778:
--

 Summary: Add integrations of Metrics API to Big Query for 
SyntheticSources load tests in Python SDK
 Key: BEAM-5778
 URL: https://issues.apache.org/jira/browse/BEAM-5778
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Kasia Kucharczyk
Assignee: Kasia Kucharczyk


Right now Metrics API collects basic metrics of load tests of SyntheticSources 
(Python SDK). It should be collected in BigQuery for presenting it on 
performance dashboards.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link

2018-10-17 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653706#comment-16653706
 ] 

Scott Wegner commented on BEAM-5741:


Thanks for the context [~melap], I'm glad to hear so much previous thought has 
gone into this. I trust your expertise and the design decisions that have 
already been made.

This bug jumps to a solution, but here's the problem we were trying to solve: 
[~rohdesam] was going through the Contributor Guide as a new contributor, and 
there were many places where he had questions. One piece of feedback was that 
he couldn't find any details on how to ask for help.

There are scattered mentions in the Contributor Guide about the dev@ list and 
Slack, but it's easy to miss if you're down deep in the page. The problem is 
solved if new contributors understand the convention that "Community" includes 
how to reach out, but in this case we didn't.

So maybe an alternative solution would be a dedicated heading at the top of the 
Contributor Guide for how to get help. Something like:

h2. Finding Help

If you find any issues with this guide or have questions that aren't answered, 
please check the  or .

> Move "Contact Us" to a top-level link
> -
>
> Key: BEAM-5741
> URL: https://issues.apache.org/jira/browse/BEAM-5741
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Priority: Major
>
> It should be very easy to figure out how to get in touch with community. 
> "Contact Us" should be a top-level link on the page.
> The page can also be improved with:
> * Some basic text on how to use subscribe / unsubscribe links
> * Recommendations on how to use various communications channels (Slack for 
> quick questions, dev@ for longer conversations. And all decisions should make 
> it back to dev@)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5760) Portable Flink support for maxBundleSize/maxBundleMillis

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5760?focusedWorklogId=155447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155447
 ]

ASF GitHub Bot logged work on BEAM-5760:


Author: ASF GitHub Bot
Created on: 17/Oct/18 15:02
Start Date: 17/Oct/18 15:02
Worklog Time Spent: 10m 
  Work Description: tweise opened a new pull request #6723: [WIP] 
[BEAM-5760] Add support for multi-element bundles to portable Flink runner.
URL: https://github.com/apache/beam/pull/6723
 
 
   This makes the operator compatible with the bundle control of the underlying 
`DoFnOperator`. Specifically, bundle size can be controlled via the 
maxBundleSize/maxBundleMillis pipeline options.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155447)
Time Spent: 10m
Remaining Estimate: 0h

> Portable Flink support for maxBundleSize/maxBundleMillis
> 
>
> Key: BEAM-5760
>

[jira] [Work logged] (BEAM-5638) Add exception handling to single message transforms in Java SDK

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5638?focusedWorklogId=155446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155446
 ]

ASF GitHub Bot logged work on BEAM-5638:


Author: ASF GitHub Bot
Created on: 17/Oct/18 15:01
Start Date: 17/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: jklukas commented on issue #6586: [BEAM-5638] Exception 
handling for Java single message transforms
URL: https://github.com/apache/beam/pull/6586#issuecomment-430664387
 
 
   Rebased and this is passing tests, now. Can you take another look, 
@reuvenlax?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155446)
Time Spent: 3h 50m  (was: 3h 40m)
Remaining Estimate: 164h 10m  (was: 164h 20m)

> Add exception handling to single message transforms in Java SDK
> ---
>
> Key: BEAM-5638
> URL: https://issues.apache.org/jira/browse/BEAM-5638
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>   Original Estimate: 168h
>  Time Spent: 3h 50m
>  Remaining Estimate: 164h 10m
>
> Add methods to MapElements, FlatMapElements, and Filter that allow users to 
> specify expected exceptions and tuple tags to associate with the with 
> collections of the successfully and unsuccessfully processed elements.
> See discussion on dev list:
> https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5771) PreCommits should support a flag to not run GCP tests

2018-10-17 Thread Colm O hEigeartaigh (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653628#comment-16653628
 ] 

Colm O hEigeartaigh commented on BEAM-5771:
---

+1 to have a task which would build and run all Java tests without running the 
GCP specific tests.

> PreCommits should support a flag to not run GCP tests
> -
>
> Key: BEAM-5771
> URL: https://issues.apache.org/jira/browse/BEAM-5771
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>
> From [~swegner]
> There was some discussion recently about ensuring anyone can easily run and 
> reproduce precommit test results locally. The precommits run Dataflow jobs, 
> which will fail if you don't have access to an Google Cloud project. One idea 
> would be to add a flag to disable Google Cloud tests, i.e. ./gradlew 
> :javaPreCommit -PdisableGcpTests
> @sweg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object

2018-10-17 Thread Asha Rostamianfar (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653625#comment-16653625
 ] 

Asha Rostamianfar commented on BEAM-5628:
-

+[~chamikara] to comment on whether it's ok to remove vcfio.py for now and add 
it back later with Nucleus support.

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> 
>
> Key: BEAM-5628
> URL: https://issues.apache.org/jira/browse/BEAM-5628
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  --
> Traceback (most recent call last):
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ] split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>  for value in reader:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>  for line in record_iterator:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>  record = next(self._vcf_reader)
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>  row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5759) ConcurrentModificationException on JmsIO checkpoint finalization

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5759?focusedWorklogId=155400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155400
 ]

ASF GitHub Bot logged work on BEAM-5759:


Author: ASF GitHub Bot
Created on: 17/Oct/18 13:51
Start Date: 17/Oct/18 13:51
Worklog Time Spent: 10m 
  Work Description: jbonofre closed pull request #6702: [BEAM-5759] 
Ensuring JmsIO checkpoint state is accessed and modified safely
URL: https://github.com/apache/beam/pull/6702
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsCheckpointMark.java
 
b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsCheckpointMark.java
index 3f106ff3442..f33abd8b555 100644
--- 
a/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsCheckpointMark.java
+++ 
b/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsCheckpointMark.java
@@ -19,12 +19,17 @@
 
 import java.util.ArrayList;
 import java.util.List;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BiFunction;
+import java.util.function.Supplier;
 import javax.jms.Message;
 import org.apache.beam.sdk.coders.AvroCoder;
 import org.apache.beam.sdk.coders.DefaultCoder;
 import org.apache.beam.sdk.io.UnboundedSource;
 import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Checkpoint for an unbounded JmsIO.Read. Consists of JMS destination name, 
and the latest message
@@ -33,25 +38,27 @@
 @DefaultCoder(AvroCoder.class)
 public class JmsCheckpointMark implements UnboundedSource.CheckpointMark {
 
-  private final List messages = new ArrayList<>();
-  private Instant oldestPendingTimestamp = BoundedWindow.TIMESTAMP_MIN_VALUE;
+  private static final Logger LOG = 
LoggerFactory.getLogger(JmsCheckpointMark.class);
+
+  private final State state = new State();
 
   public JmsCheckpointMark() {}
 
   protected List getMessages() {
-return this.messages;
+return state.getMessages();
   }
 
   protected void addMessage(Message message) throws Exception {
 Instant currentMessageTimestamp = new Instant(message.getJMSTimestamp());
-if (currentMessageTimestamp.isBefore(oldestPendingTimestamp)) {
-  oldestPendingTimestamp = currentMessageTimestamp;
-}
-messages.add(message);
+state.atomicWrite(
+() -> {
+  state.updateOldestPendingTimestampIf(currentMessageTimestamp, 
Instant::isBefore);
+  state.addMessage(message);
+});
   }
 
   protected Instant getOldestPendingTimestamp() {
-return oldestPendingTimestamp;
+return state.getOldestPendingTimestamp();
   }
 
   /**
@@ -61,17 +68,117 @@ protected Instant getOldestPendingTimestamp() {
*/
   @Override
   public void finalizeCheckpoint() {
-for (Message message : messages) {
+State snapshot = state.snapshot();
+for (Message message : snapshot.messages) {
   try {
 message.acknowledge();
 Instant currentMessageTimestamp = new 
Instant(message.getJMSTimestamp());
-if (currentMessageTimestamp.isAfter(oldestPendingTimestamp)) {
-  oldestPendingTimestamp = currentMessageTimestamp;
-}
+snapshot.updateOldestPendingTimestampIf(currentMessageTimestamp, 
Instant::isAfter);
   } catch (Exception e) {
-// nothing to do
+LOG.error("Exception while finalizing message: {}", e);
+  }
+}
+state.atomicWrite(
+() -> {
+  state.removeMessages(snapshot.messages);
+  
state.updateOldestPendingTimestampIf(snapshot.oldestPendingTimestamp, 
Instant::isAfter);
+});
+  }
+
+  /**
+   * Encapsulates the state of a checkpoint mark; the list of messages pending 
finalisation and the
+   * oldest pending timestamp. Read/write-exclusive access is provided 
throughout, and constructs
+   * allowing multiple operations to be performed atomically -- i.e. performed 
within the context of
+   * a single lock operation -- are made available.
+   */
+  private class State {
+private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
+
+private final List messages;
+private Instant oldestPendingTimestamp;
+
+public State() {
+  this(new ArrayList<>(), BoundedWindow.TIMESTAMP_MIN_VALUE);
+}
+
+private State(List messages, Instant oldestPendingTimestamp) {
+  this.messages = messages;
+  this.oldestPendingTimestamp = oldestPendingTimestamp;
+}
+
+/**
+ * Create and return a copy of the current state.
+ *
+ * @return A new {@code State} instance 

[jira] [Resolved] (BEAM-5759) ConcurrentModificationException on JmsIO checkpoint finalization

2018-10-17 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-5759.

Resolution: Fixed

> ConcurrentModificationException on JmsIO checkpoint finalization
> 
>
> Key: BEAM-5759
> URL: https://issues.apache.org/jira/browse/BEAM-5759
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.8.0
>Reporter: Andrew Fulton
>Assignee: Andrew Fulton
> Fix For: 2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When reading from a JmsIO source, a ConcurrentModificationException can be 
> thrown when checkpoint finalization occurs under heavy load.
> For example:
> {{jsonPayload: {}}
>  {{  exception: "java.util.ConcurrentModificationException}}
>  {{    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:903)}}
>  {{    at java.util.ArrayList$Itr.next(ArrayList.java:853)}}
>  {{    at 
> org.apache.beam.sdk.io.jms.JmsCheckpointMark.finalizeCheckpoint(JmsCheckpointMark.java:65)}}
>  {{    at 
> com.google.cloud.dataflow.worker.StreamingModeExecutionContext$1.run(StreamingModeExecutionContext.java:379)}}
>  {{    at 
> com.google.cloud.dataflow.worker.StreamingDataflowWorker$8.run(StreamingDataflowWorker.java:846)}}
>  {{    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}}
>  {{    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}}
>  {{    at java.lang.Thread.run(Thread.java:745)}}
>  {{"}}
>  {{  job: "2018-09-27_08_55_18-6454085774348718625"   }}
>  {{  logger: "com.google.cloud.dataflow.worker.StreamingDataflowWorker"   }}
>  {{  message: "Source checkpoint finalization failed:"   }}
>  {{  thread: "309"   }}
>  {{  work: ""   }}
>  {{  worker: "test-andrew-092715504-09270855-tkfp-harness-dnmb"   }}
>  
> Looking at the JmsCheckpointMark code, it appears that access to the pending 
> message list is unprotected - thus if a thread calls finalizeCheckpoint while 
> a separate processing thread adds more messages to the checkpoint mark list 
> then an exception will be thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5777) Running ParDo in loop with DirectRunners raises RuntimeException

2018-10-17 Thread Kasia Kucharczyk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653471#comment-16653471
 ] 

Kasia Kucharczyk commented on BEAM-5777:


Hi [~chamikara] , [~tvalentyn]  maybe you can help with it? Or you had a 
similar issue?

> Running ParDo in loop with DirectRunners raises RuntimeException
> 
>
> Key: BEAM-5777
> URL: https://issues.apache.org/jira/browse/BEAM-5777
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Jason Kuster
>Priority: Major
> Attachments: all_output.txt
>
>
> The Python [load test of ParDo operation for 
> SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
>  that I created contains parametrized loop of ParDo with no operation inside 
> besides metrics (this issue). With setting the number of iterations to >~200 
> and running the test on DirectRunner I was encountering test failures. The 
> test outputs whole (really long) pipeline logs. Some test runs raised the 
> following exception:
>  
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
>  line 144, in testParDo
>     result = p.run()
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py", 
> line 104, in run
>     result = super(TestPipeline, self).run(test_runner_api)
>   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 
> 403, in run
>     self.to_runner_api(), self.runner, self._options).run(False)
>   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 
> 416, in run
>     return self.runner.run_pipeline(self)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 139, in run_pipeline
>     return runner.run_pipeline(pipeline)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 229, in run_pipeline
>     return self.run_via_runner_api(pipeline.to_runner_api())
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 232, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1015, in run_stages
>     pcoll_buffers, safe_coders).process_bundle.metrics
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1132, in run_stage
>     self._progress_frequency).process_bundle(data_input, data_output)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1388, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1260, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 212, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 231, in process_bundle
>     self.data_channel_factory)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 343, in __init__
>     self.ops = self.create_execution_tree(self.process_bundle_descriptor)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 385, in create_execution_tree
>     descriptor.transforms, key=topological_height, reverse=True)])
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 320, in wrapper
>     result = cache[args] = func(*args)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 368, in get_operation
>     in descriptor.transforms[transform_id].outputs.items()
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 367, in 
>     for tag, pcoll_id
> ... (3 last lines repeated for long period)
>  
> RuntimeError: maximum recursion depth exceeded
> {code}
>  
>  
> From my observation, I can say the problem appeared with various iteration 
> number depending on computer resources. On my weaker computer started failing 
> on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I 
> didn't check higher number).
> I provide whole test output in Attachements.



--
This message 

[jira] [Updated] (BEAM-5777) Running ParDo in loop with DirectRunners raises RuntimeException

2018-10-17 Thread Kasia Kucharczyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kasia Kucharczyk updated BEAM-5777:
---
Description: 
The Python [load test of ParDo operation for 
SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 that I created contains parametrized loop of ParDo with no operation inside 
besides metrics (this issue). With setting the number of iterations to >~200 
and running the test on DirectRunner I was encountering test failures. The test 
outputs whole (really long) pipeline logs. Some test runs raised the following 
exception:

 
{code:java}
Traceback (most recent call last):

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
 line 144, in testParDo

    result = p.run()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py", 
line 104, in run

    result = super(TestPipeline, self).run(test_runner_api)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 403, 
in run

    self.to_runner_api(), self.runner, self._options).run(False)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 416, 
in run

    return self.runner.run_pipeline(self)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 139, in run_pipeline

    return runner.run_pipeline(pipeline)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 229, in run_pipeline

    return self.run_via_runner_api(pipeline.to_runner_api())

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 232, in run_via_runner_api

    return self.run_stages(*self.create_stages(pipeline_proto))

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1015, in run_stages

    pcoll_buffers, safe_coders).process_bundle.metrics

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1132, in run_stage

    self._progress_frequency).process_bundle(data_input, data_output)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1388, in process_bundle

    result_future = self._controller.control_handler.push(process_bundle)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1260, in push

    response = self.worker.do_instruction(request)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 212, in do_instruction

    request.instruction_id)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 231, in process_bundle

    self.data_channel_factory)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 343, in __init__

    self.ops = self.create_execution_tree(self.process_bundle_descriptor)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 385, in create_execution_tree

    descriptor.transforms, key=topological_height, reverse=True)])

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 320, in wrapper

    result = cache[args] = func(*args)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 368, in get_operation

    in descriptor.transforms[transform_id].outputs.items()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 367, in 

    for tag, pcoll_id

... (3 last lines repeated for long period)

 
RuntimeError: maximum recursion depth exceeded
{code}
 

 

>From my observation, I can say the problem appeared with various iteration 
>number depending on computer resources. On my weaker computer started failing 
>on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I 
>didn't check higher number).

I provide whole test output in Attachements.

  was:
n The Python [load test of ParDo operation for 
SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 that I created contains parametrized loop of ParDo with no operation inside 
besides metrics (this issue). With setting the number of iterations to >~200 
and running the test on DirectRunner I was encountering test failures. The test 
outputs whole (really long) pipeline logs. Some test runs raised the following 
exception:

 
{code:java}
Traceback (most recent call last):

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
 line 144, in testParDo

    result = p.run()

  File 

[jira] [Updated] (BEAM-5777) Running ParDo in loop with DirectRunners raises RuntimeException

2018-10-17 Thread Kasia Kucharczyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kasia Kucharczyk updated BEAM-5777:
---
Attachment: all_output.txt

> Running ParDo in loop with DirectRunners raises RuntimeException
> 
>
> Key: BEAM-5777
> URL: https://issues.apache.org/jira/browse/BEAM-5777
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Jason Kuster
>Priority: Major
> Attachments: all_output.txt
>
>
> The Python [load test of ParDo operation for 
> SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
>  that I created contains parametrized loop of ParDo with no operation inside 
> besides metrics (this issue). With setting the number of iterations to >~200 
> and running the test on DirectRunner I was encountering test failures. The 
> test outputs whole (really long) pipeline logs. Some test runs raised the 
> following exception:
>  
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
>  line 144, in testParDo
>     result = p.run()
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py", 
> line 104, in run
>     result = super(TestPipeline, self).run(test_runner_api)
>   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 
> 403, in run
>     self.to_runner_api(), self.runner, self._options).run(False)
>   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 
> 416, in run
>     return self.runner.run_pipeline(self)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 139, in run_pipeline
>     return runner.run_pipeline(pipeline)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 229, in run_pipeline
>     return self.run_via_runner_api(pipeline.to_runner_api())
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 232, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1015, in run_stages
>     pcoll_buffers, safe_coders).process_bundle.metrics
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1132, in run_stage
>     self._progress_frequency).process_bundle(data_input, data_output)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1388, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1260, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 212, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 231, in process_bundle
>     self.data_channel_factory)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 343, in __init__
>     self.ops = self.create_execution_tree(self.process_bundle_descriptor)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 385, in create_execution_tree
>     descriptor.transforms, key=topological_height, reverse=True)])
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 320, in wrapper
>     result = cache[args] = func(*args)
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 368, in get_operation
>     in descriptor.transforms[transform_id].outputs.items()
>   File 
> "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 367, in 
>     for tag, pcoll_id
> ... (3 last lines repeated for long period)
>  
> RuntimeError: maximum recursion depth exceeded
> {code}
>  
>  
> From my observation, I can say the problem appeared with various iteration 
> number depending on computer resources. On my weaker computer started failing 
> on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I 
> didn't check higher number).
> I can provide whole test output but it's ~1,3Mb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5777) Running ParDo in loop with DirectRunners raises RuntimeException

2018-10-17 Thread Kasia Kucharczyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kasia Kucharczyk updated BEAM-5777:
---
Description: 
n The Python [load test of ParDo operation for 
SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 that I created contains parametrized loop of ParDo with no operation inside 
besides metrics (this issue). With setting the number of iterations to >~200 
and running the test on DirectRunner I was encountering test failures. The test 
outputs whole (really long) pipeline logs. Some test runs raised the following 
exception:

 
{code:java}
Traceback (most recent call last):

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
 line 144, in testParDo

    result = p.run()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py", 
line 104, in run

    result = super(TestPipeline, self).run(test_runner_api)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 403, 
in run

    self.to_runner_api(), self.runner, self._options).run(False)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 416, 
in run

    return self.runner.run_pipeline(self)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 139, in run_pipeline

    return runner.run_pipeline(pipeline)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 229, in run_pipeline

    return self.run_via_runner_api(pipeline.to_runner_api())

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 232, in run_via_runner_api

    return self.run_stages(*self.create_stages(pipeline_proto))

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1015, in run_stages

    pcoll_buffers, safe_coders).process_bundle.metrics

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1132, in run_stage

    self._progress_frequency).process_bundle(data_input, data_output)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1388, in process_bundle

    result_future = self._controller.control_handler.push(process_bundle)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1260, in push

    response = self.worker.do_instruction(request)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 212, in do_instruction

    request.instruction_id)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 231, in process_bundle

    self.data_channel_factory)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 343, in __init__

    self.ops = self.create_execution_tree(self.process_bundle_descriptor)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 385, in create_execution_tree

    descriptor.transforms, key=topological_height, reverse=True)])

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 320, in wrapper

    result = cache[args] = func(*args)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 368, in get_operation

    in descriptor.transforms[transform_id].outputs.items()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 367, in 

    for tag, pcoll_id

... (3 last lines repeated for long period)

 
RuntimeError: maximum recursion depth exceeded
{code}
 

 

>From my observation, I can say the problem appeared with various iteration 
>number depending on computer resources. On my weaker computer started failing 
>on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I 
>didn't check higher number).

I provide whole test output in Attachements.

  was:
The Python [load test of ParDo operation for 
SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 that I created contains parametrized loop of ParDo with no operation inside 
besides metrics (this issue). With setting the number of iterations to >~200 
and running the test on DirectRunner I was encountering test failures. The test 
outputs whole (really long) pipeline logs. Some test runs raised the following 
exception:

 
{code:java}
Traceback (most recent call last):

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
 line 144, in testParDo

    result = p.run()

  File 

[jira] [Commented] (BEAM-5758) Load tests for SyntheticSources in Python

2018-10-17 Thread Kasia Kucharczyk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653437#comment-16653437
 ] 

Kasia Kucharczyk commented on BEAM-5758:


While [testing 
ParDo|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 I encountered some problems with DirectRunner. The test contains a loop of no 
operation ParDos where the number of iterations is parametrized in pipeline 
options. With ~200 iterations the test fails and returns whole INFO logs of 
pipeline steps. The only Exception that sometimes appears is:  
{code:java}
RuntimeError: maximum recursion depth exceeded{code}
I created another issue with further details about this problem. 

> Load tests for SyntheticSources in Python
> -
>
> Key: BEAM-5758
> URL: https://issues.apache.org/jira/browse/BEAM-5758
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
>
> For purpose of load testing the SyntheticSources there should be created 
> tests with following transformations:
>  * ParDo
>  * Combine
>  * GroupByKey
>  * CoGroupByKey.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5777) Running ParDo in loop with DirectRunners raises RuntimeException

2018-10-17 Thread Kasia Kucharczyk (JIRA)
Kasia Kucharczyk created BEAM-5777:
--

 Summary: Running ParDo in loop with DirectRunners raises 
RuntimeException
 Key: BEAM-5777
 URL: https://issues.apache.org/jira/browse/BEAM-5777
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Kasia Kucharczyk
Assignee: Jason Kuster


The Python [load test of ParDo operation for 
SyntheticSources|https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133]
 that I created contains parametrized loop of ParDo with no operation inside 
besides metrics (this issue). With setting the number of iterations to >~200 
and running the test on DirectRunner I was encountering test failures. The test 
outputs whole (really long) pipeline logs. Some test runs raised the following 
exception:

 
{code:java}
Traceback (most recent call last):

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
 line 144, in testParDo

    result = p.run()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py", 
line 104, in run

    result = super(TestPipeline, self).run(test_runner_api)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 403, 
in run

    self.to_runner_api(), self.runner, self._options).run(False)

  File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py", line 416, 
in run

    return self.runner.run_pipeline(self)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 139, in run_pipeline

    return runner.run_pipeline(pipeline)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 229, in run_pipeline

    return self.run_via_runner_api(pipeline.to_runner_api())

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 232, in run_via_runner_api

    return self.run_stages(*self.create_stages(pipeline_proto))

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1015, in run_stages

    pcoll_buffers, safe_coders).process_bundle.metrics

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1132, in run_stage

    self._progress_frequency).process_bundle(data_input, data_output)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1388, in process_bundle

    result_future = self._controller.control_handler.push(process_bundle)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1260, in push

    response = self.worker.do_instruction(request)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 212, in do_instruction

    request.instruction_id)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", 
line 231, in process_bundle

    self.data_channel_factory)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 343, in __init__

    self.ops = self.create_execution_tree(self.process_bundle_descriptor)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 385, in create_execution_tree

    descriptor.transforms, key=topological_height, reverse=True)])

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 320, in wrapper

    result = cache[args] = func(*args)

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 368, in get_operation

    in descriptor.transforms[transform_id].outputs.items()

  File 
"/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 367, in 

    for tag, pcoll_id

... (3 last lines repeated for long period)

 
RuntimeError: maximum recursion depth exceeded
{code}
 

 

>From my observation, I can say the problem appeared with various iteration 
>number depending on computer resources. On my weaker computer started failing 
>on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I 
>didn't check higher number).

I can provide whole test output but it's ~1,3Mb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5521) Cache execution trees in SDK worker

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5521?focusedWorklogId=155338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155338
 ]

ASF GitHub Bot logged work on BEAM-5521:


Author: ASF GitHub Bot
Created on: 17/Oct/18 10:30
Start Date: 17/Oct/18 10:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6717: [BEAM-5521] Re-use 
bundle processors across bundles.
URL: https://github.com/apache/beam/pull/6717#issuecomment-430576644
 
 
   R: @tweise 
   
   This should improve throughput substantially. There is other per-bundle 
overhead (e.g. final counters and progress reports) that may need to be looked 
at as well. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155338)
Time Spent: 20m  (was: 10m)

> Cache execution trees in SDK worker
> ---
>
> Key: BEAM-5521
> URL: https://issues.apache.org/jira/browse/BEAM-5521
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently they are re-constructed from the protos for every bundle, which is 
> expensive (especially for 1-element bundles in streaming flink). 
> Care should be taken to ensure the objects can be re-usued. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5521) Cache execution trees in SDK worker

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5521?focusedWorklogId=155331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155331
 ]

ASF GitHub Bot logged work on BEAM-5521:


Author: ASF GitHub Bot
Created on: 17/Oct/18 10:14
Start Date: 17/Oct/18 10:14
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #6717: [BEAM-5521] 
Re-use bundle processors across bundles.
URL: https://github.com/apache/beam/pull/6717
 
 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155331)
Time Spent: 10m
Remaining Estimate: 0h

> Cache execution trees in SDK worker
> ---
>
> Key: BEAM-5521
> URL: https://issues.apache.org/jira/browse/BEAM-5521
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major

[jira] [Resolved] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-17 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-5720.
---
   Resolution: Fixed
Fix Version/s: 2.9.0

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5646) Equality is broken for Rows with BYTES field

2018-10-17 Thread Gleb Kanterov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653312#comment-16653312
 ] 

Gleb Kanterov commented on BEAM-5646:
-

[~kedin] do you have any thoughts, or perhaps you can mention somebody else?

> Equality is broken for Rows with BYTES field
> 
>
> Key: BEAM-5646
> URL: https://issues.apache.org/jira/browse/BEAM-5646
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.7.0
>Reporter: Gleb Kanterov
>Assignee: Xu Mingmin
>Priority: Major
>
> The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`. 
> Java arrays do reference equality instead of comparing contents. Row stores 
> fields of type BYTES as byte[].
> These failing tests illustrate the problem:
> {code:java}
> @Test
> public void testByteArrayEquality() {
>   byte[] a0 = new byte[16];
>   byte[] b0 = new byte[16];
>   Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
>   Row a = Row.withSchema(schema).addValue(a0).build();
>   Row b = Row.withSchema(schema).addValue(b0).build();
>   Assert.assertEquals(a, b);
> }
> @Test
> public void testByteBufferEquality() {
>   byte[] a0 = new byte[16];
>   byte[] b0 = new byte[16];
>   Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
>   Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build();
>   Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build();
>   Assert.assertEquals(a, b);
> }
> {code}
>  
> Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple 
> that doesn't have offsets. `Row#getValue` will return this type, and for 
> consistency, it would be preferable to change `Row#getBytes` in an 
> incompatible way to be consistent with `Row#getValue` because that's how it 
> behaves for the rest of the methods.
>  
> Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to 
> `equals`. The problem in Spark is that `hashCode` implementation isn't 
> consistent with `equals`, see SPARK-25122.
>  
> Option 3. Consider it as intended behavior, and fix 
> `RowCoder#consistentWithEquals` implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=155328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155328
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 09:56
Start Date: 17/Oct/18 09:56
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #1729: 
[BEAM-1240] Create RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#discussion_r225842769
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.rabbitmq;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import com.rabbitmq.client.AMQP;
+import com.rabbitmq.client.Channel;
+import com.rabbitmq.client.Connection;
+import com.rabbitmq.client.ConnectionFactory;
+import com.rabbitmq.client.Consumer;
+import com.rabbitmq.client.DefaultConsumer;
+import com.rabbitmq.client.Envelope;
+import java.io.IOException;
+import java.io.Serializable;
+import java.net.ServerSocket;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.apache.qpid.server.Broker;
+import org.apache.qpid.server.BrokerOptions;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Test of {@link RabbitMqIO}. */
+@RunWith(JUnit4.class)
+public class RabbitMqIOTest implements Serializable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(RabbitMqIOTest.class);
+
+  private static int port;
+  @ClassRule public static TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Rule public transient TestPipeline p = TestPipeline.create();
+
+  private static transient Broker broker;
+
+  @BeforeClass
+  public static void startBroker() throws Exception {
+try (ServerSocket serverSocket = new ServerSocket(0)) {
+  port = serverSocket.getLocalPort();
+}
+
+System.setProperty("derby.stream.error.field", "MyApp.DEV_NULL");
+broker = new Broker();
+BrokerOptions options = new BrokerOptions();
+options.setConfigProperty(BrokerOptions.QPID_AMQP_PORT, 
String.valueOf(port));
+options.setConfigProperty(BrokerOptions.QPID_WORK_DIR, 
temporaryFolder.newFolder().toString());
+options.setConfigProperty(BrokerOptions.QPID_HOME_DIR, "src/test/qpid");
+broker.startup(options);
+  }
+
+  @AfterClass
+  public static void stopBroker() {
+broker.shutdown();
+  }
+
+  @Test
+  public void testReadQueue() throws Exception {
+final int maxNumRecords = 10;
+PCollection raw =
+p.apply(
+RabbitMqIO.read()
+.withUri("amqp://guest:guest@localhost:" + port)
+.withQueue("READ")
+.withMaxNumRecords(maxNumRecords));
+PCollection output =
+raw.apply(
+MapElements.into(TypeDescriptors.strings())
+.via(
+(RabbitMqMessage message) ->
+new String(message.getBody(), 
StandardCharsets.UTF_8)));
+
+List records =
+generateRecords(maxNumRecords)
+.stream()
+.map(record -> new String(record, StandardCharsets.UTF_8))
+.collect(Collectors.toList());
+PAssert.that(output).containsInAnyOrder(records);
+
+ConnectionFactory connectionFactory = new ConnectionFactory();
+

[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=155327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155327
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 09:56
Start Date: 17/Oct/18 09:56
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #1729: 
[BEAM-1240] Create RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#discussion_r225833559
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.rabbitmq;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import com.rabbitmq.client.AMQP;
+import com.rabbitmq.client.Channel;
+import com.rabbitmq.client.Connection;
+import com.rabbitmq.client.ConnectionFactory;
+import com.rabbitmq.client.Consumer;
+import com.rabbitmq.client.DefaultConsumer;
+import com.rabbitmq.client.Envelope;
+import java.io.IOException;
+import java.io.Serializable;
+import java.net.ServerSocket;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.apache.qpid.server.Broker;
+import org.apache.qpid.server.BrokerOptions;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Test of {@link RabbitMqIO}. */
+@RunWith(JUnit4.class)
+public class RabbitMqIOTest implements Serializable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(RabbitMqIOTest.class);
+
+  private static int port;
+  @ClassRule public static TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Rule public transient TestPipeline p = TestPipeline.create();
+
+  private static transient Broker broker;
+
+  @BeforeClass
+  public static void startBroker() throws Exception {
+try (ServerSocket serverSocket = new ServerSocket(0)) {
+  port = serverSocket.getLocalPort();
+}
+
+System.setProperty("derby.stream.error.field", "MyApp.DEV_NULL");
+broker = new Broker();
+BrokerOptions options = new BrokerOptions();
+options.setConfigProperty(BrokerOptions.QPID_AMQP_PORT, 
String.valueOf(port));
+options.setConfigProperty(BrokerOptions.QPID_WORK_DIR, 
temporaryFolder.newFolder().toString());
+options.setConfigProperty(BrokerOptions.QPID_HOME_DIR, "src/test/qpid");
+broker.startup(options);
+  }
+
+  @AfterClass
+  public static void stopBroker() {
+broker.shutdown();
+  }
+
+  @Test
+  public void testReadQueue() throws Exception {
+final int maxNumRecords = 10;
+PCollection raw =
+p.apply(
+RabbitMqIO.read()
+.withUri("amqp://guest:guest@localhost:" + port)
+.withQueue("READ")
+.withMaxNumRecords(maxNumRecords));
+PCollection output =
+raw.apply(
+MapElements.into(TypeDescriptors.strings())
+.via(
+(RabbitMqMessage message) ->
+new String(message.getBody(), 
StandardCharsets.UTF_8)));
+
+List records =
+generateRecords(maxNumRecords)
+.stream()
+.map(record -> new String(record, StandardCharsets.UTF_8))
+.collect(Collectors.toList());
+PAssert.that(output).containsInAnyOrder(records);
+
+ConnectionFactory connectionFactory = new ConnectionFactory();
+

[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2018-10-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=155329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155329
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 17/Oct/18 09:56
Start Date: 17/Oct/18 09:56
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #1729: 
[BEAM-1240] Create RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#discussion_r225856265
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -0,0 +1,315 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.rabbitmq;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import com.rabbitmq.client.AMQP;
+import com.rabbitmq.client.Channel;
+import com.rabbitmq.client.Connection;
+import com.rabbitmq.client.ConnectionFactory;
+import com.rabbitmq.client.Consumer;
+import com.rabbitmq.client.DefaultConsumer;
+import com.rabbitmq.client.Envelope;
+import java.io.IOException;
+import java.io.Serializable;
+import java.net.ServerSocket;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.apache.qpid.server.Broker;
+import org.apache.qpid.server.BrokerOptions;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Test of {@link RabbitMqIO}. */
+@RunWith(JUnit4.class)
+public class RabbitMqIOTest implements Serializable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(RabbitMqIOTest.class);
+
+  private static int port;
+  @ClassRule public static TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Rule public transient TestPipeline p = TestPipeline.create();
+
+  private static transient Broker broker;
+
+  @BeforeClass
+  public static void startBroker() throws Exception {
+try (ServerSocket serverSocket = new ServerSocket(0)) {
+  port = serverSocket.getLocalPort();
+}
+
+System.setProperty("derby.stream.error.field", "MyApp.DEV_NULL");
+broker = new Broker();
+BrokerOptions options = new BrokerOptions();
+options.setConfigProperty(BrokerOptions.QPID_AMQP_PORT, 
String.valueOf(port));
+options.setConfigProperty(BrokerOptions.QPID_WORK_DIR, 
temporaryFolder.newFolder().toString());
+options.setConfigProperty(BrokerOptions.QPID_HOME_DIR, "src/test/qpid");
+broker.startup(options);
+  }
+
+  @AfterClass
+  public static void stopBroker() {
+broker.shutdown();
+  }
+
+  @Test
+  public void testReadQueue() throws Exception {
+final int maxNumRecords = 10;
+PCollection raw =
+p.apply(
+RabbitMqIO.read()
+.withUri("amqp://guest:guest@localhost:" + port)
+.withQueue("READ")
+.withMaxNumRecords(maxNumRecords));
+PCollection output =
+raw.apply(
+MapElements.into(TypeDescriptors.strings())
+.via(
+(RabbitMqMessage message) ->
+new String(message.getBody(), 
StandardCharsets.UTF_8)));
+
+List records =
+generateRecords(maxNumRecords)
+.stream()
+.map(record -> new String(record, StandardCharsets.UTF_8))
+.collect(Collectors.toList());
+PAssert.that(output).containsInAnyOrder(records);
+
+ConnectionFactory connectionFactory = new ConnectionFactory();
+

[jira] [Updated] (BEAM-5776) Using methods in map is broken on Python 3

2018-10-17 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-5776:
--
Issue Type: Sub-task  (was: Bug)
Parent: BEAM-1251

> Using methods in map is broken on Python 3
> --
>
> Key: BEAM-5776
> URL: https://issues.apache.org/jira/browse/BEAM-5776
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Major
>
> E.g. 
> {code:java}
> pcoll | beam.Map(str.upper){code}
> no longer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)