date:20180919

[jira] [Updated] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-09-19 Thread Etienne Chauchot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-5107:
---
Fix Version/s: (was: 2.8.0)
   2.7.0

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

UNCHECKED [jira] [Created] (BEAM-5428) Implement cross-bundle state caching.

2018-09-19 Thread Robert Bradshaw (JIRA)

Robert Bradshaw created BEAM-5428:
-

 Summary: Implement cross-bundle state caching.
 Key: BEAM-5428
 URL: https://issues.apache.org/jira/browse/BEAM-5428
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-harness
Reporter: Robert Bradshaw
Assignee: Robert Bradshaw






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145589
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:10
Start Date: 19/Sep/18 08:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6426: [BEAM-3089] Fix default 
values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426#issuecomment-422701950
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145589)
Time Spent: 2h 10m  (was: 2h)

> Issue with setting the parallelism at client level using Flink runner
> -
>
> Key: BEAM-3089
> URL: https://issues.apache.org/jira/browse/BEAM-3089
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.0.0
> Environment: I am using Flink 1.2.1 running on Docker, with Task 
> Managers distributed across different VMs as part of a Docker Swarm.
>Reporter: Thalita Vergilio
>Assignee: Grzegorz Kołakowski
>Priority: Major
>  Labels: docker, flink, parallel-deployment
> Fix For: 2.8.0
>
> Attachments: flink-ui-parallelism.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When uploading an Apache Beam application using the Flink Web UI, the 
> parallelism set at job submission doesn't get picked up. The same happens 
> when submitting a job using the Flink CLI.
> In both cases, the parallelism ends up defaulting to 1.
> When I set the parallelism programmatically within the Apache Beam code, it 
> works: {{flinkPipelineOptions.setParallelism(4);}}
> I suspect the root of the problem may be in the 
> org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks 
> for Flink's GlobalConfiguration, which may not pick up runtime values passed 
> to Flink, then defaults to 1 if it doesn't find anything.
> Any ideas on how this could be fixed or worked around? I need to be able to 
> change the parallelism dynamically, so the programmatic approach won't really 
> work for me, nor will setting the Flink configuration at system level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145590
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:22
Start Date: 19/Sep/18 08:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6349: [BEAM-2687] 
Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422705767
 
 
   R: @charlesccychen This has been rebased on master with PR #6304 merged, 
please take a look. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145590)
Time Spent: 0.5h  (was: 20m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145591
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:27
Start Date: 19/Sep/18 08:27
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6349: [BEAM-2687] 
Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422707504
 
 
   Alternatively, @mxm or @tweise, if either of you are interested in reviewing 
this, go for it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145591)
Time Spent: 40m  (was: 0.5h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145595&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145595
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218716857
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -106,7 +106,7 @@ def visit_transform(self, applied_ptransform):
   if DoFnSignature(dofn).is_splittable_dofn():
 self.supported_by_fnapi_runner = False
   # The FnApiRunner does not support execution of Stateful DoFns.
 
 Review comment:
   Please update comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145595)
Time Spent: 50m  (was: 40m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145596
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218717188
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -228,6 +229,29 @@ def cross_product(elem, sides):
   pcoll | beam.FlatMap(cross_product, beam.pvalue.AsList(derived)),
   equal_to([('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')]))
 
+  def test_pardo_state_only(self):
 
 Review comment:
   Is there a way we can unify this with the existing state / timer tests in 
userstate_test.py?  This isn't FnApiRunner specific.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145596)
Time Spent: 1h  (was: 50m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145598
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218718655
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -198,6 +203,85 @@ def is_globally_windowed(self):
 == sideinputs._global_window_mapping_fn)
 
 
+class CombiningValueRuntimeState(userstate.RuntimeState):
+  def __init__(self, underlying_bag_state, combinefn):
+self._combinefn = combinefn
+self._underlying_bag_state = underlying_bag_state
+
+  def _read_accumulator(self, rewrite=True):
+merged_accumulator = self._combinefn.merge_accumulators(
+self._underlying_bag_state.read())
+if rewrite:
+  self._underlying_bag_state.clear()
+  self._underlying_bag_state.add(merged_accumulator)
+return merged_accumulator
+
+  def read(self):
+return self._combinefn.extract_output(self._read_accumulator())
+
+  def add(self, value):
+# Prefer blind writes, but don't let them grow unboundedly.
+if random.random() < 0.5:
+  accumulator = self._read_accumulator(False)
+  self._underlying_bag_state.clear()
+else:
+  accumulator = self._combinefn.create_accumulator()
+self._underlying_bag_state.add(
+self._combinefn.add_input(accumulator, value))
+
+  def clear(self):
+self._underlying_bag_state.clear()
+
+
+# TODO(BEAM-5428): Implement cross-bundle state caching.
+class SynchronousBagRuntimeState(userstate.RuntimeState):
+  def __init__(self, state_handler, state_key, value_coder):
+self._state_handler = state_handler
+self._state_key = state_key
+self._value_coder = value_coder
+
+  def read(self):
+return IterableState(
+self._state_handler, self._state_key, self._value_coder)
+
+  def add(self, value):
+self._state_handler.blocking_append(
+self._state_key, self._value_coder.encode(value))
+
+  def clear(self):
+self._state_handler.blocking_clear(self._state_key)
+
+
+class UserStateContext(userstate.UserStateContext):
 
 Review comment:
   Should we name this something else (maybe `FnApiUserStateContext`?) to 
differentiate it from the base?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145598)
Time Spent: 1h 10m  (was: 1h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145597
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218719163
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -198,6 +203,85 @@ def is_globally_windowed(self):
 == sideinputs._global_window_mapping_fn)
 
 
+class CombiningValueRuntimeState(userstate.RuntimeState):
+  def __init__(self, underlying_bag_state, combinefn):
+self._combinefn = combinefn
+self._underlying_bag_state = underlying_bag_state
+
+  def _read_accumulator(self, rewrite=True):
+merged_accumulator = self._combinefn.merge_accumulators(
+self._underlying_bag_state.read())
+if rewrite:
+  self._underlying_bag_state.clear()
+  self._underlying_bag_state.add(merged_accumulator)
+return merged_accumulator
+
+  def read(self):
+return self._combinefn.extract_output(self._read_accumulator())
+
+  def add(self, value):
+# Prefer blind writes, but don't let them grow unboundedly.
+if random.random() < 0.5:
 
 Review comment:
   Can you add a rationale for this threshold or state that it is arbitrary for 
now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145597)
Time Spent: 1h 10m  (was: 1h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145599
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218718313
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -950,6 +957,10 @@ def to_runner_api_parameter(self, context):
 spec=beam_runner_api_pb2.FunctionSpec(
 urn=python_urns.PICKLED_DOFN_INFO,
 payload=picked_pardo_fn_data)),
+state_specs={spec.name: spec.to_runner_api(context)
+ for spec in state_specs},
+timer_specs={spec.name: spec.to_runner_api(context)
+ for spec in timer_specs},
 
 Review comment:
   Will we eventually use the "self-loop" PCollections as in 
https://github.com/apache/beam/pull/5883?
   
   (CC: @lukecwik)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145599)
Time Spent: 1h 10m  (was: 1h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145600
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:53
Start Date: 19/Sep/18 08:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#6349: [BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218719936
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -121,6 +123,23 @@ def process_encoded(self, encoded_windowed_values):
   self.output(decoded_value)
 
 
+class IterableState(object):
 
 Review comment:
   The naming is a little weird that both `IterableState` and `*RuntimeState` 
(defined at the bottom of this file) are both used at runtime.  I don't have 
any suggestions, but maybe there is better naming?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145600)
Time Spent: 1h 20m  (was: 1h 10m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3371) Add ability to stage directories with compiled classes to Spark

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3371?focusedWorklogId=145601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145601
 ]

ASF GitHub Bot logged work on BEAM-3371:


Author: ASF GitHub Bot
Created on: 19/Sep/18 08:59
Start Date: 19/Sep/18 08:59
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6244: [BEAM-3371] Enable running integration tests on Spark
URL: https://github.com/apache/beam/pull/6244#discussion_r218722767
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
 ##
 @@ -95,73 +83,33 @@ public void translate(Pipeline pipeline) {
 pipeline.replaceAll(
 FlinkTransformOverrides.getDefaultOverrides(translationMode == 
TranslationMode.STREAMING));
 
-// Local flink configurations work in the same JVM and have no problems 
with improperly
-// formatted files on classpath (eg. directories with .class files or 
empty directories).
-// prepareFilesToStage() only when using remote flink cluster.
-List filesToStage;
-if (!options.getFlinkMaster().matches("\\[.*\\]")) {
-  filesToStage = prepareFilesToStage();
-} else {
-  filesToStage = options.getFilesToStage();
-}
+prepareFilesToStageForRemoteClusterExecution();
 
 FlinkPipelineTranslator translator;
 if (translationMode == TranslationMode.STREAMING) {
   this.flinkStreamEnv =
-  FlinkExecutionEnvironments.createStreamExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createStreamExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkStreamingPipelineTranslator(flinkStreamEnv, 
options);
 } else {
   this.flinkBatchEnv =
-  FlinkExecutionEnvironments.createBatchExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createBatchExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkBatchPipelineTranslator(flinkBatchEnv, options);
 }
 
 translator.translate(pipeline);
   }
 
-  private List prepareFilesToStage() {
-return options
-.getFilesToStage()
-.stream()
-.map(File::new)
-.filter(File::exists)
-.map(file -> file.isDirectory() ? packageDirectoriesToStage(file) : 
file.getAbsolutePath())
-.collect(Collectors.toList());
-  }
-
-  private String packageDirectoriesToStage(File directoryToStage) {
-String hash = calculateDirectoryContentHash(directoryToStage);
-String pathForJar = getUniqueJarPath(hash);
-zipDirectory(directoryToStage, pathForJar);
-return pathForJar;
-  }
-
-  private String calculateDirectoryContentHash(File directoryToStage) {
-Hasher hasher = Hashing.md5().newHasher();
-try (OutputStream hashStream = Funnels.asOutputStream(hasher)) {
-  ZipFiles.zipDirectory(directoryToStage, hashStream);
-  return Base64Variants.MODIFIED_FOR_URL.encode(hasher.hash().asBytes());
-} catch (IOException e) {
-  throw new RuntimeException(e);
-}
-  }
-
-  private String getUniqueJarPath(String contentHash) {
-String tempLocation = options.getTempLocation();
-
-checkArgument(
-!Strings.isNullOrEmpty(tempLocation),
-"Please provide \"tempLocation\" pipeline option. Flink runner needs 
it to store jars "
-+ "made of directories that were on classpath.");
-
-return String.format("%s%s.jar", tempLocation, contentHash);
-  }
-
-  private void zipDirectory(File directoryToStage, String uniqueDirectoryPath) 
{
-try {
-  ZipFiles.zipDirectory(directoryToStage, new 
FileOutputStream(uniqueDirectoryPath));
-} catch (IOException e) {
-  throw new RuntimeException(e);
+  /**
+   * Local configurations work in the same JVM and have no problems with 
improperly formatted files
+   * on classpath (eg. directories with .class files or empty directories). 
Prepare files for
+   * staging only when using remote cluster.
+   */
+  private void prepareFilesToStageForRemoteClusterExecution() {
+if (!options.getFlinkMaster().matches("\\[.*\\]")) {
 
 Review comment:
   I'm ok with that but I'd double check if this doesn't break something. @mxm, 
wdyt?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145601)
Time Spent: 3h  (was: 2h 50m)

> Add ability to stage directories with compiled classes to Spark
> ---
>
>

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145603
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:19
Start Date: 19/Sep/18 09:19
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6433: [BEAM-2687] 
Implement Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#issuecomment-422723671
 
 
   Might be of interest to @mxm and @tweise 
   
   This depends on #6376 and #6349.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145603)
Time Spent: 1.5h  (was: 1h 20m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145610
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:33
Start Date: 19/Sep/18 09:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6349: 
[BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218732852
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -121,6 +123,23 @@ def process_encoded(self, encoded_windowed_values):
   self.output(decoded_value)
 
 
+class IterableState(object):
 
 Review comment:
   Yeah, renamed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145610)
Time Spent: 2h 20m  (was: 2h 10m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145608
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:33
Start Date: 19/Sep/18 09:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6349: 
[BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218731991
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -228,6 +229,29 @@ def cross_product(elem, sides):
   pcoll | beam.FlatMap(cross_product, beam.pvalue.AsList(derived)),
   equal_to([('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')]))
 
+  def test_pardo_state_only(self):
 
 Review comment:
   I put these here because it runs them in various modes (directly, over Grpc 
with and without multiple threads, over subprocesses). I view the ones in 
userstate_test as more user-api-facing ones (though note that those can't be 
used here, as they use the streaming test stream and avoid assert_that due to 
it being broken in streaming mode on the other direct runner, and instead have 
a DoFn that stores things into a list (which of course won't work 
cross-process). 
   
   So, yes, it'd be nice to unify, but not right now.
   
   (What I'd like to do eventually is expand this suite of tests into a basic 
Runner Compatibility Matrix test...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145608)
Time Spent: 2h  (was: 1h 50m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

UNCHECKED [jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145607
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:33
Start Date: 19/Sep/18 09:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6349: 
[BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218733202
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -198,6 +203,85 @@ def is_globally_windowed(self):
 == sideinputs._global_window_mapping_fn)
 
 
+class CombiningValueRuntimeState(userstate.RuntimeState):
+  def __init__(self, underlying_bag_state, combinefn):
+self._combinefn = combinefn
+self._underlying_bag_state = underlying_bag_state
+
+  def _read_accumulator(self, rewrite=True):
+merged_accumulator = self._combinefn.merge_accumulators(
+self._underlying_bag_state.read())
+if rewrite:
+  self._underlying_bag_state.clear()
+  self._underlying_bag_state.add(merged_accumulator)
+return merged_accumulator
+
+  def read(self):
+return self._combinefn.extract_output(self._read_accumulator())
+
+  def add(self, value):
+# Prefer blind writes, but don't let them grow unboundedly.
+if random.random() < 0.5:
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145607)
Time Spent: 1h 50m  (was: 1h 40m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145606
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:33
Start Date: 19/Sep/18 09:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6349: 
[BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218730581
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -106,7 +106,7 @@ def visit_transform(self, applied_ptransform):
   if DoFnSignature(dofn).is_splittable_dofn():
 self.supported_by_fnapi_runner = False
   # The FnApiRunner does not support execution of Stateful DoFns.
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145606)
Time Spent: 1h 40m  (was: 1.5h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145609
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 09:33
Start Date: 19/Sep/18 09:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6349: 
[BEAM-2687] Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#discussion_r218733618
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -950,6 +957,10 @@ def to_runner_api_parameter(self, context):
 spec=beam_runner_api_pb2.FunctionSpec(
 urn=python_urns.PICKLED_DOFN_INFO,
 payload=picked_pardo_fn_data)),
+state_specs={spec.name: spec.to_runner_api(context)
+ for spec in state_specs},
+timer_specs={spec.name: spec.to_runner_api(context)
+ for spec in timer_specs},
 
 Review comment:
   Eventually. I've had some more thoughts on that that I'v been meaning to 
write up. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145609)
Time Spent: 2h 10m  (was: 2h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Jenkins build is back to normal : beam_PostRelease_NightlySnapshot #374

2018-09-19 Thread Apache Jenkins Server

See

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145631
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 11:47
Start Date: 19/Sep/18 11:47
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6426: 
[BEAM-3089] Fix default values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426#discussion_r218771491
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -156,9 +181,11 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 throw new IllegalArgumentException("The checkpoint interval must be 
positive");
   }
   flinkStreamEnv.enableCheckpointing(checkpointInterval, 
options.getCheckpointingMode());
-  flinkStreamEnv
-  .getCheckpointConfig()
-  .setCheckpointTimeout(options.getCheckpointTimeoutMillis());
+  if (options.getCheckpointTimeoutMillis() != -1) {
 
 Review comment:
   -1 in Flink means disabled. Anything else should be applied, even other 
negative numbers which will raise an exception.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145631)
Time Spent: 2.5h  (was: 2h 20m)

> Issue with setting the parallelism at client level using Flink runner
> -
>
> Key: BEAM-3089
> URL: https://issues.apache.org/jira/browse/BEAM-3089
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.0.0
> Environment: I am using Flink 1.2.1 running on Docker, with Task 
> Managers distributed across different VMs as part of a Docker Swarm.
>Reporter: Thalita Vergilio
>Assignee: Grzegorz Kołakowski
>Priority: Major
>  Labels: docker, flink, parallel-deployment
> Fix For: 2.8.0
>
> Attachments: flink-ui-parallelism.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When uploading an Apache Beam application using the Flink Web UI, the 
> parallelism set at job submission doesn't get picked up. The same happens 
> when submitting a job using the Flink CLI.
> In both cases, the parallelism ends up defaulting to 1.
> When I set the parallelism programmatically within the Apache Beam code, it 
> works: {{flinkPipelineOptions.setParallelism(4);}}
> I suspect the root of the problem may be in the 
> org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks 
> for Flink's GlobalConfiguration, which may not pick up runtime values passed 
> to Flink, then defaults to 1 if it doesn't find anything.
> Any ideas on how this could be fixed or worked around? I need to be able to 
> change the parallelism dynamically, so the programmatic approach won't really 
> work for me, nor will setting the Flink configuration at system level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145630
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 11:47
Start Date: 19/Sep/18 11:47
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6426: 
[BEAM-3089] Fix default values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426#discussion_r218771485
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -56,12 +56,13 @@
   "Address of the Flink Master where the Pipeline should be executed. Can"
   + " either be of the form \"host:port\" or one of the special values 
[local], "
   + "[collection] or [auto].")
+  @Default.String("[auto]")
   String getFlinkMaster();
 
   void setFlinkMaster(String value);
 
   @Description("The degree of parallelism to be used when distributing 
operations onto workers.")
-  @Default.InstanceFactory(DefaultParallelismFactory.class)
+  @Default.Integer(-1)
 
 Review comment:
   Updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145630)
Time Spent: 2h 20m  (was: 2h 10m)

> Issue with setting the parallelism at client level using Flink runner
> -
>
> Key: BEAM-3089
> URL: https://issues.apache.org/jira/browse/BEAM-3089
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.0.0
> Environment: I am using Flink 1.2.1 running on Docker, with Task 
> Managers distributed across different VMs as part of a Docker Swarm.
>Reporter: Thalita Vergilio
>Assignee: Grzegorz Kołakowski
>Priority: Major
>  Labels: docker, flink, parallel-deployment
> Fix For: 2.8.0
>
> Attachments: flink-ui-parallelism.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When uploading an Apache Beam application using the Flink Web UI, the 
> parallelism set at job submission doesn't get picked up. The same happens 
> when submitting a job using the Flink CLI.
> In both cases, the parallelism ends up defaulting to 1.
> When I set the parallelism programmatically within the Apache Beam code, it 
> works: {{flinkPipelineOptions.setParallelism(4);}}
> I suspect the root of the problem may be in the 
> org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks 
> for Flink's GlobalConfiguration, which may not pick up runtime values passed 
> to Flink, then defaults to 1 if it doesn't find anything.
> Any ideas on how this could be fixed or worked around? I need to be able to 
> change the parallelism dynamically, so the programmatic approach won't really 
> work for me, nor will setting the Flink configuration at system level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PreCommit_Website_Cron #78

2018-09-19 Thread Apache Jenkins Server

See 


--
[...truncated 7.78 KB...]

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:assemble (Thread[Task worker for ':buildSrc',5,main]) completed. Took 0.0 secs.
:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
1.554 secs.
:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.002 secs.
:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.047 secs.
:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.0 secs.
:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. 
Took 0.0 secs.
:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.004 secs.
:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.003 secs.
:processTestResources (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:processTestResources (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.002 secs.
:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. 
Took 0.0 secs.
:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. Took 
0.006 secs.
:check (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:check (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. Took 
0.0 secs.
:build (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:build
Skipping task ':buildSrc:build' as it has no action

[jira] [Work logged] (BEAM-5299) Define max global window as a shared value in protos like URN enums.

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5299?focusedWorklogId=145635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145635
 ]

ASF GitHub Bot logged work on BEAM-5299:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:05
Start Date: 19/Sep/18 12:05
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6381: 
[BEAM-5299] Define max timestamp for global window in proto
URL: https://github.com/apache/beam/pull/6381#discussion_r218776145
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -51,6 +51,8 @@ test {
 }
 
 dependencies {
+  // Required to loads constants from the model, e.g. max timestamp for global 
window
+  shadow project(path: ":beam-model-pipeline", configuration: "shadow")
 
 Review comment:
   The goal is to have the max global window timestamp _once_ defined globally 
across all sdks. `beam-model-pipeline` is the proto definition which all SDKs 
depend on. 
   
   In Python the module structure is much simpler and we just have the proto 
always available. In java, due to the runners being in java, we have a separate 
`beam-sds-java-core` module which so far doesn't depend on the pipeline module.
   
   We have two options 1) depend on the pipeline definition in java-core 2) 
move GlobalWindow from java-core (I found this to be pretty tricky).
   
   You propose to factor out the timestamp into a new module and let all sdks 
depend on that? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145635)
Time Spent: 1h 50m  (was: 1h 40m)

> Define max global window as a shared value in protos like URN enums.
> 
>
> Key: BEAM-5299
> URL: https://issues.apache.org/jira/browse/BEAM-5299
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-core, sdk-py-core
>Reporter: Luke Cwik
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: portability
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Instead of having each language define a max timestamp themselves, define the 
> max timestamps within proto to be shared across different languages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #63

2018-09-19 Thread Apache Jenkins Server

See 


--
[...truncated 555.20 KB...]
raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537358792.333833082","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>


==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537358784.16_0f59797f-c305-4d90-8de3-5341080a60ac failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537358784.53_a6614e3f-8208-458a-960e-45cef2d25d7d failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537358785.01_5659404a-6484-4eba-a6cf-972156829b9a failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537358785.49_23e11d74-d090-46b9-a883-a8dffcfcc211 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537358786.03_f4786b34-d074-473e-9ba6-e9487baced92 failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_group_by_key_1537358786.53_05874e58-273b-4dea-ab3c-75a5e

[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=145638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145638
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:11
Start Date: 19/Sep/18 12:11
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #5841: 
[BEAM-3446] Fixes RedisIO non-prefix read operations
URL: https://github.com/apache/beam/pull/5841#discussion_r218777618
 
 

 ##
 File path: 
sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
 ##
 @@ -280,28 +316,77 @@ public void processElement(ProcessContext 
processContext) throws Exception {
   while (!finished) {
 ScanResult scanResult = jedis.scan(cursor, scanParams);
 List keys = scanResult.getResult();
-
-Pipeline pipeline = jedis.pipelined();
-if (keys != null) {
-  for (String key : keys) {
-pipeline.get(key);
-  }
-  List values = pipeline.syncAndReturnAll();
-  for (int i = 0; i < values.size(); i++) {
-processContext.output(KV.of(keys.get(i), (String) values.get(i)));
-  }
+for (String k : keys) {
+  processContext.output(k);
 }
-
 cursor = scanResult.getStringCursor();
 if ("0".equals(cursor)) {
   finished = true;
 }
   }
 }
+  }
+  /** A {@link DoFn} requesting Redis server to get key/value pairs. */
+  private static class ReadFn extends BaseReadFn> {
+@Nullable transient Multimap bundles = null;
+@Nullable AtomicInteger batchCount = null;
+private final int batchSize;
+
+@StartBundle
+public void startBundle(StartBundleContext context) {
+  bundles = ArrayListMultimap.create();
+  batchCount = new AtomicInteger();
+}
 
-@Teardown
-public void teardown() {
-  jedis.close();
+ReadFn(RedisConnectionConfiguration connectionConfiguration, int 
batchSize) {
+  super(connectionConfiguration);
+  this.batchSize = batchSize;
+}
+
+private int getBatchSize() {
+  return batchSize;
+}
+
+@ProcessElement
+public void processElement(ProcessContext processContext, BoundedWindow 
window)
+throws Exception {
+  String key = processContext.element();
+  bundles.put(window, key);
+  if (batchCount.incrementAndGet() > getBatchSize()) {
+Multimap> kvs = fetchAndFlush();
+for (BoundedWindow w : kvs.keySet()) {
+  for (KV kv : kvs.get(w)) {
+processContext.output(kv);
+  }
+}
+  }
+}
+
+private Multimap> fetchAndFlush() {
+  Multimap> kvs = 
ArrayListMultimap.create();
+  for (BoundedWindow w : bundles.keySet()) {
+String[] keys = new String[bundles.get(w).size()];
+bundles.get(w).toArray(keys);
+List results = jedis.mget(keys);
+for (int i = 0; i < results.size(); i++) {
+  if (results.get(i) != null) {
+kvs.put(w, KV.of(keys[i], results.get(i)));
+  }
+}
+  }
+  bundles = ArrayListMultimap.create();
+  batchCount.set(0);
+  return kvs;
+}
+
+@FinishBundle
+public void finishBundle(FinishBundleContext context) throws Exception {
+  Multimap> kvs = fetchAndFlush();
 
 Review comment:
   Thanks for answering, I have somehow misread the `startBundle` as a setup 
only method. I see how everything fits now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145638)
Time Spent: 5h 50m  (was: 5h 40m)

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6

Build failed in Jenkins: beam_PerformanceTests_Python #1456

2018-09-19 Thread Apache Jenkins Server

See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam15 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision c49a97ecbf815b320926285dcddba993590e3073 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c49a97ecbf815b320926285dcddba993590e3073
Commit message: "[BEAM-4176] enumerate primitive transforms in portable 
construction"
 > git rev-list --no-walk c49a97ecbf815b320926285dcddba993590e3073 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5533870220785853713.sh
+ rm -rf 

[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4968193746340324354.sh
+ rm -rf 

[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4453688895837627178.sh
+ virtualenv 

New python executable in 

Also creating executable in 

Installing setuptools, pkg_resources, pip, wheel...done.
Running virtualenv with interpreter /usr/bin/python2
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4883181239145343168.sh
+ 

 install --upgrade setuptools pip
Requirement already up-to-date: setuptools in 
./env/.perfkit_env/lib/python2.7/site-packages (40.4.1)
Requirement already up-to-date: pip in 
./env/.perfkit_env/lib/python2.7/site-packages (18.0)
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4092285843890972873.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git 

Cloning into 
'
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6438401075173903514.sh
+ 

 install -r 

Collecting absl-py (from -r 

 (line 14))
Collecting jinja2>=2.7 (from -r 

 (line 15))
  Using cached 
https://files.pythonhosted.org/packages/7f/ff/ae64bacdfc95f27a016a7bed8e8686763ba4d277a78ca76f32659220a731/Jinja2-2.10-py2.py3-none-any.whl
Requirement already satisfied: setuptools in 
./env/.perfkit_env/lib/python2.7/site-packages (from -r 

 (line 16)) (40.4.1)
Collecting colorlog[windows]==2.6.0 (from -r 

 (line 17))
  Using cached 
https://files.pythonhosted.org/packages/59/1a/46a1bf2044ad8b30b52fed0f389338c85747e093fe7f51a567f4cb525892/colorlog-2.6.0-py2.py3-none-any.whl
Collecting blinker>=1.3 (from -r 

 (line 18))
Collecting futures>=3.0.3 (from -r 

 (line 19))
  Using cached 
https://files.pythonhosted.org/packages/2d/99/b2c4e9d5a30f6471e410a146232b4118e697fa3ffc06d6a65efde84debd0/futures-3.2.0-py2-none-any.whl
Collecting PyYAML==3.12 (from -r

[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=145639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145639
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:18
Start Date: 19/Sep/18 12:18
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6285: [BEAM-4861] 
Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285#issuecomment-422780639
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145639)
Time Spent: 50m  (was: 40m)

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-5144) [beam_PostCommit_Java_GradleBuild][org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMark][Flake] Expected messages count assert fails

2018-09-19 Thread Alexey Romanenko (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-5144:
--

Assignee: Andrew Fulton  (was: Jean-Baptiste Onofré)

> [beam_PostCommit_Java_GradleBuild][org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMark][Flake]
>  Expected messages count assert fails
> ---
>
> Key: BEAM-5144
> URL: https://issues.apache.org/jira/browse/BEAM-5144
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Andrew Fulton
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Failing job url:
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1196/testReport/]
>  Job history url:
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1245/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMark/history/?start=50]
>  Relevant log:
> java.lang.AssertionError: expected:<0> but was:<5> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.junit.Assert.assertEquals(Assert.java:631) at 
> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMark(JmsIOTest.java:324) at
>  
> Multiple security exceptions found:
> java.lang.SecurityException: User name [test_user] or password is invalid.
> Aug 07, 2018 6:10:07 PM org.apache.activemq.broker.TransportConnection 
> processAddConnection WARNING: Failed to add Connection 
> ID:apache-beam-jenkins-slave-group-rq04-41395-1533665392741-59:9 due to {} 
> java.lang.SecurityException: User name [null] or password is invalid.
>  
>  
> Additionally, broker service stopped errors:
> org.apache.activemq.broker.BrokerStoppedException: Broker 
> BrokerService[localhost] is being stopped
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

UNCHECKED [jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=145644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145644
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:39
Start Date: 19/Sep/18 12:39
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-422786830
 
 
   @reuvenlax @iemejia @jbonofre  - I am a bit unsure what to do here and would 
appreciate your thoughts. Note this is all about big performance improvements 
for IO that write to HDFS.
   
   The Beam `FileSystems.rename()` is under documented and performs differently 
depending on the underlying filesystem. For example HDFS will fail if the file 
exists, while we use the `StandardCopyOption.REPLACE_EXISTING` in 
`LocalFileSystem` and always overwrite.
   
   In this PR I opted to include the addition of a 
`StandardMoveOptions.OVERWRITE_EXISTING_FILES` and if the underlying FS threw a 
`FileAlreadyExistsException` then it would only be overwritten if that flag was 
enabled. This would work nicely for HDFS. However, this logic is at the mercy 
of the underlying FS as many won't surface an error and will simply overwrite. 
Thus if a user does not include the 
`StandardMoveOptions.OVERWRITE_EXISTING_FILES` they may see surprising results.
   
   I think we can do one of the following:
   1. Be explicit and make all FS respect a control flag to enable overwriting
   2. Silently overwrite always 
   3. Let people provide the flag, knowing some FS will ignore it
   
   Note that `FileSystem.rename()` is _never_ used in the Beam codebase until 
this PR but is a public method so we might affect others. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145644)
Time Spent: 2h 10m  (was: 2h)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145645
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:44
Start Date: 19/Sep/18 12:44
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6433: [BEAM-2687] Implement 
Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#issuecomment-422788463
 
 
   @robertwb thanks for the ping. I did not pay attention to `fn_api_runner` so 
far - this gives a good idea for the wiring work we need to do for user state 
and timers in the Flink runner. Just curious, why do we have a direct runner 
and an fn_api_runner for Python, why not just have the latter? Or perhaps that 
is the (undocumented) goal?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145645)
Time Spent: 2.5h  (was: 2h 20m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145648&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145648
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 12:59
Start Date: 19/Sep/18 12:59
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6349: [BEAM-2687] Implement 
State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422793226
 
 
   Nice to see this coming together! Is this PR still required since the 
changes are now also in #6433 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145648)
Time Spent: 2h 40m  (was: 2.5h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3371) Add ability to stage directories with compiled classes to Spark

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3371?focusedWorklogId=145652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145652
 ]

ASF GitHub Bot logged work on BEAM-3371:


Author: ASF GitHub Bot
Created on: 19/Sep/18 13:13
Start Date: 19/Sep/18 13:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6244: 
[BEAM-3371] Enable running integration tests on Spark
URL: https://github.com/apache/beam/pull/6244#discussion_r218785917
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
 ##
 @@ -95,73 +83,33 @@ public void translate(Pipeline pipeline) {
 pipeline.replaceAll(
 FlinkTransformOverrides.getDefaultOverrides(translationMode == 
TranslationMode.STREAMING));
 
-// Local flink configurations work in the same JVM and have no problems 
with improperly
-// formatted files on classpath (eg. directories with .class files or 
empty directories).
-// prepareFilesToStage() only when using remote flink cluster.
-List filesToStage;
-if (!options.getFlinkMaster().matches("\\[.*\\]")) {
-  filesToStage = prepareFilesToStage();
-} else {
-  filesToStage = options.getFilesToStage();
-}
+prepareFilesToStageForRemoteClusterExecution();
 
 FlinkPipelineTranslator translator;
 if (translationMode == TranslationMode.STREAMING) {
   this.flinkStreamEnv =
-  FlinkExecutionEnvironments.createStreamExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createStreamExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkStreamingPipelineTranslator(flinkStreamEnv, 
options);
 } else {
   this.flinkBatchEnv =
-  FlinkExecutionEnvironments.createBatchExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createBatchExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkBatchPipelineTranslator(flinkBatchEnv, options);
 }
 
 translator.translate(pipeline);
   }
 
-  private List prepareFilesToStage() {
-return options
-.getFilesToStage()
-.stream()
-.map(File::new)
-.filter(File::exists)
-.map(file -> file.isDirectory() ? packageDirectoriesToStage(file) : 
file.getAbsolutePath())
-.collect(Collectors.toList());
-  }
-
-  private String packageDirectoriesToStage(File directoryToStage) {
-String hash = calculateDirectoryContentHash(directoryToStage);
-String pathForJar = getUniqueJarPath(hash);
-zipDirectory(directoryToStage, pathForJar);
-return pathForJar;
-  }
-
-  private String calculateDirectoryContentHash(File directoryToStage) {
-Hasher hasher = Hashing.md5().newHasher();
-try (OutputStream hashStream = Funnels.asOutputStream(hasher)) {
-  ZipFiles.zipDirectory(directoryToStage, hashStream);
-  return Base64Variants.MODIFIED_FOR_URL.encode(hasher.hash().asBytes());
-} catch (IOException e) {
-  throw new RuntimeException(e);
-}
-  }
-
-  private String getUniqueJarPath(String contentHash) {
-String tempLocation = options.getTempLocation();
-
-checkArgument(
-!Strings.isNullOrEmpty(tempLocation),
-"Please provide \"tempLocation\" pipeline option. Flink runner needs 
it to store jars "
-+ "made of directories that were on classpath.");
-
-return String.format("%s%s.jar", tempLocation, contentHash);
-  }
-
-  private void zipDirectory(File directoryToStage, String uniqueDirectoryPath) 
{
-try {
-  ZipFiles.zipDirectory(directoryToStage, new 
FileOutputStream(uniqueDirectoryPath));
-} catch (IOException e) {
-  throw new RuntimeException(e);
+  /**
+   * Local configurations work in the same JVM and have no problems with 
improperly formatted files
+   * on classpath (eg. directories with .class files or empty directories). 
Prepare files for
+   * staging only when using remote cluster.
+   */
+  private void prepareFilesToStageForRemoteClusterExecution() {
+if (!options.getFlinkMaster().matches("\\[.*\\]")) {
 
 Review comment:
   flinkMaster is set to `[auto]` before *test* pipeline execution, if unset. 
With #6426 (not merged) it always defaults to `[auto]`.
   
   `[auto]` can be used when a) testing locally b) submitting using the flink 
CLI, where the job jar file will already be staged and no additional files need 
to be staged. So this shouldn't break anything.
   
   Whenever we change code like this we should ideally add tests (which you did 
for the new class). Additional tests for the staging in 
`FlinkPipelineExecutionEnvironmentTest` would be nice, but we can also do it in 
a follow-up.


This is an automated mess

[jira] [Work logged] (BEAM-3371) Add ability to stage directories with compiled classes to Spark

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3371?focusedWorklogId=145651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145651
 ]

ASF GitHub Bot logged work on BEAM-3371:


Author: ASF GitHub Bot
Created on: 19/Sep/18 13:13
Start Date: 19/Sep/18 13:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6244: 
[BEAM-3371] Enable running integration tests on Spark
URL: https://github.com/apache/beam/pull/6244#discussion_r218792264
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
 ##
 @@ -95,73 +83,33 @@ public void translate(Pipeline pipeline) {
 pipeline.replaceAll(
 FlinkTransformOverrides.getDefaultOverrides(translationMode == 
TranslationMode.STREAMING));
 
-// Local flink configurations work in the same JVM and have no problems 
with improperly
-// formatted files on classpath (eg. directories with .class files or 
empty directories).
-// prepareFilesToStage() only when using remote flink cluster.
-List filesToStage;
-if (!options.getFlinkMaster().matches("\\[.*\\]")) {
-  filesToStage = prepareFilesToStage();
-} else {
-  filesToStage = options.getFilesToStage();
-}
+prepareFilesToStageForRemoteClusterExecution();
 
 FlinkPipelineTranslator translator;
 if (translationMode == TranslationMode.STREAMING) {
   this.flinkStreamEnv =
-  FlinkExecutionEnvironments.createStreamExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createStreamExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkStreamingPipelineTranslator(flinkStreamEnv, 
options);
 } else {
   this.flinkBatchEnv =
-  FlinkExecutionEnvironments.createBatchExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createBatchExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkBatchPipelineTranslator(flinkBatchEnv, options);
 }
 
 translator.translate(pipeline);
   }
 
-  private List prepareFilesToStage() {
-return options
-.getFilesToStage()
-.stream()
-.map(File::new)
-.filter(File::exists)
-.map(file -> file.isDirectory() ? packageDirectoriesToStage(file) : 
file.getAbsolutePath())
-.collect(Collectors.toList());
-  }
-
-  private String packageDirectoriesToStage(File directoryToStage) {
-String hash = calculateDirectoryContentHash(directoryToStage);
-String pathForJar = getUniqueJarPath(hash);
-zipDirectory(directoryToStage, pathForJar);
-return pathForJar;
-  }
-
-  private String calculateDirectoryContentHash(File directoryToStage) {
-Hasher hasher = Hashing.md5().newHasher();
-try (OutputStream hashStream = Funnels.asOutputStream(hasher)) {
-  ZipFiles.zipDirectory(directoryToStage, hashStream);
-  return Base64Variants.MODIFIED_FOR_URL.encode(hasher.hash().asBytes());
-} catch (IOException e) {
-  throw new RuntimeException(e);
-}
-  }
-
-  private String getUniqueJarPath(String contentHash) {
-String tempLocation = options.getTempLocation();
-
-checkArgument(
-!Strings.isNullOrEmpty(tempLocation),
-"Please provide \"tempLocation\" pipeline option. Flink runner needs 
it to store jars "
-+ "made of directories that were on classpath.");
-
-return String.format("%s%s.jar", tempLocation, contentHash);
-  }
-
-  private void zipDirectory(File directoryToStage, String uniqueDirectoryPath) 
{
-try {
-  ZipFiles.zipDirectory(directoryToStage, new 
FileOutputStream(uniqueDirectoryPath));
-} catch (IOException e) {
-  throw new RuntimeException(e);
+  /**
+   * Local configurations work in the same JVM and have no problems with 
improperly formatted files
+   * on classpath (eg. directories with .class files or empty directories). 
Prepare files for
+   * staging only when using remote cluster.
+   */
+  private void prepareFilesToStageForRemoteClusterExecution() {
 
 Review comment:
   Would make this static and pass the options explictly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145651)
Time Spent: 3h 10m  (was: 3h)

> Add ability to stage directories with compiled classes to Spark
> ---
>
> Key: BEAM-3371
> URL: https://issues.apache.org/jira/browse/BEAM-3371
>

[jira] [Work logged] (BEAM-3371) Add ability to stage directories with compiled classes to Spark

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3371?focusedWorklogId=145650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145650
 ]

ASF GitHub Bot logged work on BEAM-3371:


Author: ASF GitHub Bot
Created on: 19/Sep/18 13:13
Start Date: 19/Sep/18 13:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6244: 
[BEAM-3371] Enable running integration tests on Spark
URL: https://github.com/apache/beam/pull/6244#discussion_r218785020
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
 ##
 @@ -95,73 +83,33 @@ public void translate(Pipeline pipeline) {
 pipeline.replaceAll(
 FlinkTransformOverrides.getDefaultOverrides(translationMode == 
TranslationMode.STREAMING));
 
-// Local flink configurations work in the same JVM and have no problems 
with improperly
-// formatted files on classpath (eg. directories with .class files or 
empty directories).
-// prepareFilesToStage() only when using remote flink cluster.
-List filesToStage;
-if (!options.getFlinkMaster().matches("\\[.*\\]")) {
-  filesToStage = prepareFilesToStage();
-} else {
-  filesToStage = options.getFilesToStage();
-}
+prepareFilesToStageForRemoteClusterExecution();
 
 FlinkPipelineTranslator translator;
 if (translationMode == TranslationMode.STREAMING) {
   this.flinkStreamEnv =
-  FlinkExecutionEnvironments.createStreamExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createStreamExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkStreamingPipelineTranslator(flinkStreamEnv, 
options);
 } else {
   this.flinkBatchEnv =
-  FlinkExecutionEnvironments.createBatchExecutionEnvironment(options, 
filesToStage);
+  FlinkExecutionEnvironments.createBatchExecutionEnvironment(
+  options, options.getFilesToStage());
   translator = new FlinkBatchPipelineTranslator(flinkBatchEnv, options);
 }
 
 translator.translate(pipeline);
   }
 
-  private List prepareFilesToStage() {
-return options
-.getFilesToStage()
-.stream()
-.map(File::new)
-.filter(File::exists)
-.map(file -> file.isDirectory() ? packageDirectoriesToStage(file) : 
file.getAbsolutePath())
-.collect(Collectors.toList());
-  }
-
-  private String packageDirectoriesToStage(File directoryToStage) {
-String hash = calculateDirectoryContentHash(directoryToStage);
-String pathForJar = getUniqueJarPath(hash);
-zipDirectory(directoryToStage, pathForJar);
-return pathForJar;
-  }
-
-  private String calculateDirectoryContentHash(File directoryToStage) {
-Hasher hasher = Hashing.md5().newHasher();
-try (OutputStream hashStream = Funnels.asOutputStream(hasher)) {
-  ZipFiles.zipDirectory(directoryToStage, hashStream);
-  return Base64Variants.MODIFIED_FOR_URL.encode(hasher.hash().asBytes());
-} catch (IOException e) {
-  throw new RuntimeException(e);
-}
-  }
-
-  private String getUniqueJarPath(String contentHash) {
-String tempLocation = options.getTempLocation();
-
-checkArgument(
-!Strings.isNullOrEmpty(tempLocation),
-"Please provide \"tempLocation\" pipeline option. Flink runner needs 
it to store jars "
-+ "made of directories that were on classpath.");
-
-return String.format("%s%s.jar", tempLocation, contentHash);
-  }
-
-  private void zipDirectory(File directoryToStage, String uniqueDirectoryPath) 
{
-try {
-  ZipFiles.zipDirectory(directoryToStage, new 
FileOutputStream(uniqueDirectoryPath));
-} catch (IOException e) {
-  throw new RuntimeException(e);
+  /**
+   * Local configurations work in the same JVM and have no problems with 
improperly formatted files
+   * on classpath (eg. directories with .class files or empty directories). 
Prepare files for
+   * staging only when using remote cluster.
+   */
+  private void prepareFilesToStageForRemoteClusterExecution() {
+if (!options.getFlinkMaster().matches("\\[.*\\]")) {
+  options.setFilesToStage(
+  prepareFilesForStaging(options.getFilesToStage(), 
options.getTempLocation()));
 
 Review comment:
   Could you remove the static import for this? 
`PipelineResources.prepareFilesForStaging(..)` is more descriptive and implies 
it is a method from a different class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was

[jira] [Created] (BEAM-5429) Optimise GCSFilesystem rename implementation

2018-09-19 Thread Tim Robertson (JIRA)

Tim Robertson created BEAM-5429:
---

 Summary: Optimise GCSFilesystem rename implementation
 Key: BEAM-5429
 URL: https://issues.apache.org/jira/browse/BEAM-5429
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Affects Versions: 2.6.0
Reporter: Tim Robertson
Assignee: Chamikara Jayalath


{{GCSFileSystem}} implements a {{rename()}} with a {{copy}} and {{delete}} 
operation.

However, GCS has an [Objects: 
rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] 
which looks like it would be a metadata operation only and therefore has the 
potential to be much quicker.

Once BEAM-5036 is fixed IOs that write files will make use of a {{rename}} 
operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-19 Thread Tim Robertson (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620565#comment-16620565
 ] 

Tim Robertson edited comment on BEAM-5036 at 9/19/18 1:19 PM:
--

BEAM-5429 is created for the GCS implementation ( CC [~sinisa_lyh] ) and I'll 
aim to complete this one in time for 2.8.0


was (Author: timrobertson100):
BEAM-5429 is created for the GCS implementation ( CC [~sinisa_lyh] )

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

UNCHECKED [jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-19 Thread Tim Robertson (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620565#comment-16620565
 ] 

Tim Robertson commented on BEAM-5036:
-

BEAM-5429 is created for the GCS implementation ( CC [~sinisa_lyh] )

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145658
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 13:41
Start Date: 19/Sep/18 13:41
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6426: [BEAM-3089] Fix default 
values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426#issuecomment-422808148
 
 
   Thanks for the review @angoenka. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145658)
Time Spent: 2h 40m  (was: 2.5h)

> Issue with setting the parallelism at client level using Flink runner
> -
>
> Key: BEAM-3089
> URL: https://issues.apache.org/jira/browse/BEAM-3089
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.0.0
> Environment: I am using Flink 1.2.1 running on Docker, with Task 
> Managers distributed across different VMs as part of a Docker Swarm.
>Reporter: Thalita Vergilio
>Assignee: Grzegorz Kołakowski
>Priority: Major
>  Labels: docker, flink, parallel-deployment
> Fix For: 2.8.0
>
> Attachments: flink-ui-parallelism.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When uploading an Apache Beam application using the Flink Web UI, the 
> parallelism set at job submission doesn't get picked up. The same happens 
> when submitting a job using the Flink CLI.
> In both cases, the parallelism ends up defaulting to 1.
> When I set the parallelism programmatically within the Apache Beam code, it 
> works: {{flinkPipelineOptions.setParallelism(4);}}
> I suspect the root of the problem may be in the 
> org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks 
> for Flink's GlobalConfiguration, which may not pick up runtime values passed 
> to Flink, then defaults to 1 if it doesn't find anything.
> Any ideas on how this could be fixed or worked around? I need to be able to 
> change the parallelism dynamically, so the programmatic approach won't really 
> work for me, nor will setting the Flink configuration at system level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5324) Finish Python 3 porting for unpackaged files

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5324?focusedWorklogId=145660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145660
 ]

ASF GitHub Bot logged work on BEAM-5324:


Author: ASF GitHub Bot
Created on: 19/Sep/18 13:55
Start Date: 19/Sep/18 13:55
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#6424: [BEAM-5324] Partially port unpackaged modules to Python 3
URL: https://github.com/apache/beam/pull/6424#discussion_r218812794
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -108,7 +106,7 @@ def __init__(self, operation_name, step_name, consumers, 
counter_factory,
 self.receivers = [
 operations.ConsumerSet(
 self.counter_factory, self.name_context.step_name, 0,
-next(itervalues(consumers)), self.windowed_coder)]
+next(iter(consumers.values())), self.windowed_coder)]
 
 Review comment:
   You're right. Changed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145660)
Time Spent: 1h 20m  (was: 1h 10m)

> Finish Python 3 porting for unpackaged files
> 
>
> Key: BEAM-5324
> URL: https://issues.apache.org/jira/browse/BEAM-5324
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=145667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145667
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 19/Sep/18 14:08
Start Date: 19/Sep/18 14:08
Worklog Time Spent: 10m 
  Work Description: vvarma commented on a change in pull request #5841: 
[BEAM-3446] Fixes RedisIO non-prefix read operations
URL: https://github.com/apache/beam/pull/5841#discussion_r218818530
 
 

 ##
 File path: 
sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
 ##
 @@ -280,28 +316,77 @@ public void processElement(ProcessContext 
processContext) throws Exception {
   while (!finished) {
 ScanResult scanResult = jedis.scan(cursor, scanParams);
 List keys = scanResult.getResult();
-
-Pipeline pipeline = jedis.pipelined();
-if (keys != null) {
-  for (String key : keys) {
-pipeline.get(key);
-  }
-  List values = pipeline.syncAndReturnAll();
-  for (int i = 0; i < values.size(); i++) {
-processContext.output(KV.of(keys.get(i), (String) values.get(i)));
-  }
+for (String k : keys) {
+  processContext.output(k);
 }
-
 cursor = scanResult.getStringCursor();
 if ("0".equals(cursor)) {
   finished = true;
 }
   }
 }
+  }
+  /** A {@link DoFn} requesting Redis server to get key/value pairs. */
+  private static class ReadFn extends BaseReadFn> {
+@Nullable transient Multimap bundles = null;
+@Nullable AtomicInteger batchCount = null;
+private final int batchSize;
+
+@StartBundle
+public void startBundle(StartBundleContext context) {
+  bundles = ArrayListMultimap.create();
+  batchCount = new AtomicInteger();
+}
 
-@Teardown
-public void teardown() {
-  jedis.close();
+ReadFn(RedisConnectionConfiguration connectionConfiguration, int 
batchSize) {
+  super(connectionConfiguration);
+  this.batchSize = batchSize;
+}
+
+private int getBatchSize() {
+  return batchSize;
+}
+
+@ProcessElement
+public void processElement(ProcessContext processContext, BoundedWindow 
window)
+throws Exception {
+  String key = processContext.element();
+  bundles.put(window, key);
+  if (batchCount.incrementAndGet() > getBatchSize()) {
+Multimap> kvs = fetchAndFlush();
+for (BoundedWindow w : kvs.keySet()) {
+  for (KV kv : kvs.get(w)) {
+processContext.output(kv);
+  }
+}
+  }
+}
+
+private Multimap> fetchAndFlush() {
+  Multimap> kvs = 
ArrayListMultimap.create();
+  for (BoundedWindow w : bundles.keySet()) {
+String[] keys = new String[bundles.get(w).size()];
+bundles.get(w).toArray(keys);
+List results = jedis.mget(keys);
+for (int i = 0; i < results.size(); i++) {
+  if (results.get(i) != null) {
+kvs.put(w, KV.of(keys[i], results.get(i)));
+  }
+}
+  }
+  bundles = ArrayListMultimap.create();
+  batchCount.set(0);
+  return kvs;
+}
+
+@FinishBundle
+public void finishBundle(FinishBundleContext context) throws Exception {
+  Multimap> kvs = fetchAndFlush();
 
 Review comment:
   Thanks @iemejia . Do suggest if there any other changes needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145667)
Time Spent: 6h  (was: 5h 50m)

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6868f1a6e95196c9dcfc9ac640/sdks/java/io/redis/src/main/java/org/a

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145669
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 14:14
Start Date: 19/Sep/18 14:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6349: [BEAM-2687] 
Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422820431
 
 
   #6433  builds on top of this
   (separated out because this was straightforward, the other less so, and
   also has other dependencies).
   
   On Wed, Sep 19, 2018 at 3:00 PM Thomas Weise 
   wrote:
   
   > Nice to see this coming together! Is this PR still required since the
   > changes are now also in #6433  ?
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145669)
Time Spent: 2h 50m  (was: 2h 40m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145671
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 14:16
Start Date: 19/Sep/18 14:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6433: [BEAM-2687] 
Implement Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#issuecomment-422821064
 
 
   Yes, having one is the eventual goal. The FnAPI runner doesn't yet fully
   support streaming yet though.
   
   On Wed, Sep 19, 2018 at 2:44 PM Thomas Weise 
   wrote:
   
   > @robertwb  thanks for the ping. I did not
   > pay attention to fn_api_runner so far - this gives a good idea for the
   > wiring work we need to do for user state and timers in the Flink runner.
   > Just curious, why do we have a direct runner and an fn_api_runner for
   > Python, why not just have the latter? Or perhaps that is the (undocumented)
   > goal?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145671)
Time Spent: 3h  (was: 2h 50m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

UNCHECKED [jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=145690&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145690
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:03
Start Date: 19/Sep/18 15:03
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #6225: [BEAM-4176] 
move some artifact-staging logs from info to debug
URL: https://github.com/apache/beam/pull/6225#issuecomment-422838244
 
 
   several tests failed that I don't think are related to my changes here:
   
   `:beam-sdks-java-io-hadoop-input-format:test`:
   ```
   19:51:52 localhost/127.0.0.1:7000 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
   19:51:52 Fatal configuration error; unable to start server.  See log for 
stacktrace.
   19:51:52 Sep 18, 2018 11:51:51 PM 
org.apache.cassandra.service.CassandraDaemon exitOrFail
   19:51:52 SEVERE: Fatal configuration error
   19:51:52 org.apache.cassandra.exceptions.ConfigurationException: 
localhost/127.0.0.1:7000 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
   19:51:52 at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:557)
   19:51:52 at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:501)
   19:51:52 at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:485)
   19:51:52 at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:745)
   ```
   
   `:beam-sdks-java-io-elasticsearch-tests-2:test`:
   ```
   19:45:44 org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest > 
testWriteWithIndexFn FAILED
   19:45:44 java.lang.AssertionError: Einstein index holds incorrect count 
expected:<10> but was:<20>
   19:45:44 at org.junit.Assert.fail(Assert.java:88)
   19:45:44 at org.junit.Assert.failNotEquals(Assert.java:834)
   19:45:44 at org.junit.Assert.assertEquals(Assert.java:645)
   19:45:44 at 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testWriteWithIndexFn(ElasticsearchIOTestCommon.java:439)
   19:45:44 at 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithIndexFn(ElasticsearchIOTest.java:175)
   ```
   
   `:beam-sdks-java-io-amazon-web-services:test`:
   ```
   19:43:59 org.apache.beam.sdk.io.aws.sqs.SqsIOTest > testRead FAILED
   19:43:59 akka.stream.impl.io.ConnectionSourceStage$$anon$2$$anon$1: Bind 
failed because of Address already in use
   19:43:59 
   19:43:59 Caused by:
   19:43:59 java.net.BindException: Address already in use
   19:44:07 
   19:44:07 org.apache.beam.sdk.io.aws.sns.SnsIOTest > testRetries 
STANDARD_ERROR
   19:44:07 Sep 18, 2018 11:44:07 PM 
org.apache.beam.sdk.io.aws.sns.SnsIO$Write$SnsWriterFn processElement
   19:44:07 WARNING: Error writing to SNS. Retry attempt[1]
   19:44:07 com.amazonaws.services.sns.model.InternalErrorException: 
Service unavailable (Service: null; Status Code: 0; Error Code: null; Request 
ID: null)
   19:44:07 at 
org.apache.beam.sdk.io.aws.sns.AmazonSNSMockErrors.publish(AmazonSNSMockErrors.java:29)
   19:44:07 at 
org.apache.beam.sdk.io.aws.sns.SnsIO$Write$SnsWriterFn.processElement(SnsIO.java:314)
   ```
   
   perhaps the hadoop and AWS modules were conflicting about binding the same 
port?
   
   I'm not sure what happened to the elasticsearch test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145690)
Time Spent: 22.5h  (was: 22h 20m)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 4.18.31 PM.png, Screen Shot 
> 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[beam] branch master updated (c49a97e -> 6cfaac5)

2018-09-19 Thread iemejia

This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c49a97e  [BEAM-4176] enumerate primitive transforms in portable 
construction
 add c31ea1f  [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS rename, and alert on error when doing an HDFS copy
 add 7ea7127  [BEAM-4861] Improve clarity around exceptions thrown in rename
 add fd16052  [BEAM-4861] Typo in comment
 new 6cfaac5  Merge pull request #6285: [BEAM-4861] Autocreate directories 
when doing an HDFS rename

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../apache/beam/sdk/io/hdfs/HadoopFileSystem.java  | 89 +++---
 .../beam/sdk/io/hdfs/HadoopFileSystemTest.java | 69 +
 2 files changed, 149 insertions(+), 9 deletions(-)

[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=145691&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145691
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:06
Start Date: 19/Sep/18 15:06
Worklog Time Spent: 10m 
  Work Description: iemejia closed pull request #6285: [BEAM-4861] 
Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
 
b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
index 8230c699836..b08a70fc943 100644
--- 
a/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
+++ 
b/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
@@ -19,6 +19,7 @@
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.ImmutableList;
+import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.net.URI;
 import java.nio.ByteBuffer;
@@ -26,6 +27,7 @@
 import java.nio.channels.ReadableByteChannel;
 import java.nio.channels.SeekableByteChannel;
 import java.nio.channels.WritableByteChannel;
+import java.nio.file.FileAlreadyExistsException;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
@@ -41,6 +43,8 @@
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileUtil;
 import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Adapts {@link org.apache.hadoop.fs.FileSystem} connectors to be used as 
Apache Beam {@link
@@ -68,6 +72,9 @@
  * 
  */
 class HadoopFileSystem extends FileSystem {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HadoopFileSystem.class);
+
+  @VisibleForTesting static final String LOG_CREATE_DIRECTORY = "Creating 
directory %s";
   @VisibleForTesting final org.apache.hadoop.fs.FileSystem fileSystem;
 
   HadoopFileSystem(Configuration configuration) throws IOException {
@@ -129,29 +136,93 @@ protected void copy(List 
srcResourceIds, ListThe number of source resources must equal the number of destination 
resources. Destination
+   * resources will be created recursively.
+   *
+   * @param srcResourceIds the references of the source resources
+   * @param destResourceIds the references of the destination resources
+   * @throws FileNotFoundException if the source resources are missing. When 
rename throws, the
+   * state of the resources is unknown but safe: for every (source, 
destination) pair of
+   * resources, the following are possible: a) source exists, b) 
destination exists, c) source
+   * and destination both exist. Thus no data is lost, however, duplicated 
resource are
+   * possible. In such scenarios, callers can use {@code match()} to 
determine the state of the
+   * resource.
+   * @throws FileAlreadyExistsException if the target resources already exist.
+   * @throws IOException if the underlying filesystem indicates the rename was 
not performed but no
+   * other errors were thrown.
+   */
   @Override
   protected void rename(
   List srcResourceIds, List 
destResourceIds)
   throws IOException {
 for (int i = 0; i < srcResourceIds.size(); ++i) {
-  fileSystem.rename(srcResourceIds.get(i).toPath(), 
destResourceIds.get(i).toPath());
+
+  // rename in HDFS requires the target directory to exist or silently 
fails (BEAM-4861)
+  Path targetDirectory = destResourceIds.get(i).toPath().getParent();
+  if (!fileSystem.exists(targetDirectory)) {
+LOG.debug(
+String.format(
+LOG_CREATE_DIRECTORY, 
Path.getPathWithoutSchemeAndAuthority(targetDirectory)));
+boolean success = fileSystem.mkdirs(targetDirectory);
+if (!success) {
+  throw new IOException(
+  String.format(
+  "Unable to create target directory %s. No further 
information provided by underlying filesystem.",
+  targetDirectory));
+}
+  }
+
+  boolean success =
+  fileSystem.rename(srcResourceIds.get(i).toPath(), 
destResourceIds.get(i).toPath());
+  if (!success) {
+if (!fileSystem.exists(srcResourceIds.get(i).toPath())) {
+  throw new FileNotFoundException(
+  String.format(
+  "Unable to rename resource %s to %s as source not found.",
+  srcResourceIds.get(i).toPath(),

[beam] 01/01: Merge pull request #6285: [BEAM-4861] Autocreate directories when doing an HDFS rename

2018-09-19 Thread iemejia

This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 6cfaac5d1cf61129651f91f68d2b9314efed72cf
Merge: c49a97e fd16052
Author: Ismaël Mejía 
AuthorDate: Wed Sep 19 17:06:54 2018 +0200

Merge pull request #6285: [BEAM-4861] Autocreate directories when doing an 
HDFS rename

 .../apache/beam/sdk/io/hdfs/HadoopFileSystem.java  | 89 +++---
 .../beam/sdk/io/hdfs/HadoopFileSystemTest.java | 69 +
 2 files changed, 149 insertions(+), 9 deletions(-)

[jira] [Resolved] (BEAM-4861) Hadoop Filesystem silently fails

2018-09-19 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-4861.

   Resolution: Fixed
Fix Version/s: 2.8.0

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #64

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[timrobertson100] [BEAM-4861] Typo in comment

--
[...truncated 554.96 KB...]
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537370048.314034455","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>



==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537370040.72_114ff8f0-11f8-4afe-9c21-7f31e758380c failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537370041.12_a56a1fe9-9480-4fde-954b-9543e0b0b9f6 failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537370041.6_486212e6-394d-48c8-bfc1-88623f9cad86 failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537370042.07_253b8c5e-c8d3-438c-b670-7ae80431e60b 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537370042.56_8c2cda9b-e139-4329-b3d0-534bddcea16c failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/ru

[beam] branch master updated (6cfaac5 -> b7c2975)

2018-09-19 Thread iemejia

This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6cfaac5  Merge pull request #6285: [BEAM-4861] Autocreate directories 
when doing an HDFS rename
 add 6863ccb  Endpoint host port configuration reset issue
 add 5dfe848  Endpoint host port configuration reset issue
 add be2daa9  Fixes https://issues.apache.org/jira/browse/BEAM-3446. 
BaseReadFn to abstract general jedis operations. Separated key fetch using 
prefix and get by key into serparate DoFn.
 add 88cc752  package private and typo fix
 add a498446  Using mget with configurable batch size to increase 
efficiency of read ops
 add 4249bea  fixing equality of batch size
 add 728776f  fixing order of test arguments, to fix 
:beam-sdks-java-io-redis:compileTestJava
 add eb1e084  Aggregating on per window, applyiong redis get on per window 
batch and sequentially pushing to output collector
 add b7c2975  Merge pull request #5841: [BEAM-3446] Fixes RedisIO 
non-prefix read operations

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/beam/sdk/io/redis/RedisIO.java | 129 +
 .../org/apache/beam/sdk/io/redis/RedisIOTest.java  |  33 ++
 2 files changed, 140 insertions(+), 22 deletions(-)

[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=145693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145693
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:14
Start Date: 19/Sep/18 15:14
Worklog Time Spent: 10m 
  Work Description: iemejia closed pull request #5841: [BEAM-3446] Fixes 
RedisIO non-prefix read operations
URL: https://github.com/apache/beam/pull/5841
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java 
b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
index 279ca46db2b..57d0b77af1f 100644
--- a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
+++ b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
@@ -20,7 +20,10 @@
 import static com.google.common.base.Preconditions.checkArgument;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
 import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.coders.KvCoder;
@@ -34,6 +37,7 @@
 import org.apache.beam.sdk.transforms.SerializableFunctions;
 import org.apache.beam.sdk.transforms.View;
 import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
@@ -109,6 +113,7 @@ public static Read read() {
 return new AutoValue_RedisIO_Read.Builder()
 .setConnectionConfiguration(RedisConnectionConfiguration.create())
 .setKeyPattern("*")
+.setBatchSize(1000)
 .build();
   }
 
@@ -119,6 +124,7 @@ public static Read read() {
   public static ReadAll readAll() {
 return new AutoValue_RedisIO_ReadAll.Builder()
 .setConnectionConfiguration(RedisConnectionConfiguration.create())
+.setBatchSize(1000)
 .build();
   }
 
@@ -142,6 +148,8 @@ private RedisIO() {}
 @Nullable
 abstract String keyPattern();
 
+abstract int batchSize();
+
 abstract Builder builder();
 
 @AutoValue.Builder
@@ -152,6 +160,8 @@ private RedisIO() {}
   @Nullable
   abstract Builder setKeyPattern(String keyPattern);
 
+  abstract Builder setBatchSize(int batchSize);
+
   abstract Read build();
 }
 
@@ -185,6 +195,10 @@ public Read 
withConnectionConfiguration(RedisConnectionConfiguration connection)
   return builder().setConnectionConfiguration(connection).build();
 }
 
+public Read withBatchSize(int batchSize) {
+  return builder().setBatchSize(batchSize).build();
+}
+
 @Override
 public void populateDisplayData(DisplayData.Builder builder) {
   connectionConfiguration().populateDisplayData(builder);
@@ -196,7 +210,11 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
   return input
   .apply(Create.of(keyPattern()))
-  
.apply(RedisIO.readAll().withConnectionConfiguration(connectionConfiguration()));
+  .apply(ParDo.of(new ReadKeysWithPattern(connectionConfiguration(
+  .apply(
+  RedisIO.readAll()
+  .withConnectionConfiguration(connectionConfiguration())
+  .withBatchSize(batchSize()));
 }
   }
 
@@ -208,6 +226,8 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 @Nullable
 abstract RedisConnectionConfiguration connectionConfiguration();
 
+abstract int batchSize();
+
 abstract ReadAll.Builder builder();
 
 @AutoValue.Builder
@@ -215,6 +235,8 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   @Nullable
   abstract ReadAll.Builder 
setConnectionConfiguration(RedisConnectionConfiguration connection);
 
+  abstract ReadAll.Builder setBatchSize(int batchSize);
+
   abstract ReadAll build();
 }
 
@@ -243,25 +265,27 @@ public ReadAll 
withConnectionConfiguration(RedisConnectionConfiguration connecti
   return builder().setConnectionConfiguration(connection).build();
 }
 
+public ReadAll withBatchSize(int batchSize) {
+  return builder().setBatchSize(batchSize).build();
+}
+
 @Override
 public PCollection> expand(PCollection input) {
   checkArgument(connectionConfiguration() != null, 
"withConnectionConfiguration() is required");
 
   return input
-  .apply(ParDo.o

[jira] [Resolved] (BEAM-3446) RedisIO non-prefix read operations

2018-09-19 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-3446.

   Resolution: Fixed
Fix Version/s: 2.8.0

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6868f1a6e95196c9dcfc9ac640/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L292



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #65

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[vin1990] Endpoint host port configuration reset issue

[vin1990] Endpoint host port configuration reset issue

[vin1990] Fixes https://issues.apache.org/jira/browse/BEAM-3446. BaseReadFn to

[vin1990] package private and typo fix

[vin1990] Using mget with configurable batch size to increase efficiency of read

[vin1990] fixing equality of batch size

[vin1990] fixing order of test arguments, to fix

[vin1990] Aggregating on per window, applyiong redis get on per window batch and

--
[...truncated 554.12 KB...]
raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537370417.520571380","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>


==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537370409.95_ec03f2e6-b8ac-4a83-8fe2-893f776f6a91 failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537370410.39_d60b4bc5-d144-45fb-a764-00b3a669faa9 failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537370411.0_ed08efb8-128d-43a7-a968-e0551a718c54 failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537370411.46_5478fe71-7fc5-4cc3-8770-271b95ca3c01 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537370411.94_f7b80bf9-cbfa-4c78-8893-165747e22740 failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most rece

[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=145695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145695
 ]

ASF GitHub Bot logged work on BEAM-4176:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:20
Start Date: 19/Sep/18 15:20
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #6225: [BEAM-4176] 
move some artifact-staging logs from info to debug
URL: https://github.com/apache/beam/pull/6225#issuecomment-422844406
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145695)
Time Spent: 22h 40m  (was: 22.5h)

> Java: Portable batch runner passes all ValidatesRunner tests that 
> non-portable runner passes
> 
>
> Key: BEAM-4176
> URL: https://issues.apache.org/jira/browse/BEAM-4176
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 4.18.31 PM.png, Screen Shot 
> 2018-09-03 at 11.07.38 AM.png
>
>  Time Spent: 22h 40m
>  Remaining Estimate: 0h
>
> We need this as a sanity check that runner execution is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=145696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145696
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:42
Start Date: 19/Sep/18 15:42
Worklog Time Spent: 10m 
  Work Description: vvarma commented on issue #5841: [BEAM-3446] Fixes 
RedisIO non-prefix read operations
URL: https://github.com/apache/beam/pull/5841#issuecomment-422852677
 
 
   @iemejia Thank you!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145696)
Time Spent: 6h 20m  (was: 6h 10m)

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6868f1a6e95196c9dcfc9ac640/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L292



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=145698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145698
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:50
Start Date: 19/Sep/18 15:50
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6285: [BEAM-4861] 
Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285#issuecomment-422855400
 
 
   Thanks Ismaël, Reuven
   
   > On 19 Sep 2018, at 17:07, Ismaël Mejía  wrote:
   > 
   > Merged #6285 into master.
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub, or mute the thread.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145698)
Time Spent: 1h 10m  (was: 1h)

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=145699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145699
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 19/Sep/18 15:53
Start Date: 19/Sep/18 15:53
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6289: [BEAM-5036] Optimize 
the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-422856503
 
 
   Interesting question, it is ideal to define and document a clear default 
behavior for rename in all Beam filesystems (since there are no options allowed 
in the API).
   HDFS users probably will expect that the default rename behavior does NOT 
overwrite (as HDFS works), and also because this implies possible data loss, 
but I am not sure if there is a strong reason for other Filesystems to do 
overwrite by default (e.g. Local).
   cc @lukecwik too for eventual extra feedback since the original authors of 
Beam FileSystems are not in the project anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145699)
Time Spent: 2h 20m  (was: 2h 10m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145708
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:37
Start Date: 19/Sep/18 16:37
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6349: [BEAM-2687] Implement 
State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422872034
 
 
   Makes sense to commit this first and separately from the timer related 
discussion (I looked at this before I saw the ML thread).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145708)
Time Spent: 3h 10m  (was: 3h)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145718
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218879954
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
+args 'checkout', 'asf-site'
+  }
+  copy {
+from contentBuildDir
+into contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'add', contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'commit', '-m', 'Update website ' + new Date().format('/MM/dd 
HH:mm:ss')
+  }
+  exec {
 
 Review comment:
   It would be useful for testing to split out everything before this to a 
separate task. Then we could run the prerequisite task locally to verify 
behavior or repro bugs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145718)
Time Spent: 2h 10m  (was: 2h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145719
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218875886
 
 

 ##
 File path: .test-infra/jenkins/job_PostCommit_Website_Merge.groovy
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+import PostcommitJobBuilder
+
+
+// This job builds and merges the website.
+PostcommitJobBuilder.postCommitJob('beam_PostCommit_Website_Merge', 'Run 
Website Merge',
 
 Review comment:
   I wonder if this job shouldn't have a trigger phrase. Triggering would allow 
pushing unmerged content to the publishing branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145719)
Time Spent: 2h 20m  (was: 2h 10m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145713
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218879411
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
 
 Review comment:
   Here's the script used by another Apache project to manage the same process: 
https://github.com/apache/guacamole-website/blob/22043e1c01a1808a3c95b87a901fee45737259ed/build.sh#L136
   
   There are a few additional steps which they perform which I believe would 
make sense to include:
   
   - Move generated contents to a temp directory outside the enlistment. This 
would ensure we don't hit any issues when switches branches.
   - Nuke previous contents from `contentRepoDir` before copying new contents. 
This ensures that any removed files also get removed from the branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145713)
Time Spent: 1.5h  (was: 1h 20m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145720
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218875314
 
 

 ##
 File path: .test-infra/jenkins/job_PostCommit_Website_Merge.groovy
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+import PostcommitJobBuilder
+
+
+// This job builds and merges the website.
+PostcommitJobBuilder.postCommitJob('beam_PostCommit_Website_Merge', 'Run 
Website Merge',
+  'Website Merge', this) {
+
+  description('Builds and merges the web site.')
 
 Review comment:
   A user without prior knowledge might not understand what a 'Website Merge' 
means. How about elaborating a bit in the description? i.e.:
   
   'Merge generated website content into asf-site branch for hosting'


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145720)
Time Spent: 2h 20m  (was: 2h 10m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145717
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218880132
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
+args 'checkout', 'asf-site'
+  }
+  copy {
+from contentBuildDir
+into contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'add', contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'commit', '-m', 'Update website ' + new Date().format('/MM/dd 
HH:mm:ss')
+  }
+  exec {
+executable 'git'
+args 'push', 'origin', 'asf-site'
+  }
+  */
+  println "Successful test of mergeWebsite"
 
 Review comment:
   Remove / reword before merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145717)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145714
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218877387
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -22,6 +22,8 @@ apply plugin: "base"
 def dockerImageTag = 'beam-website'
 def dockerWorkDir = "/repo"
 def buildDir = "$project.rootDir/build/website"
+def contentBuildDir = "$buildDir/content"
+def contentRepoDir = "$project.rootDir/website/content"
 
 Review comment:
   This branch should now *only* have generated website content. Therefore I 
suggest dropping the `website/content` subdirectory. Note that this will 
require updating the githubpubsub publishing config when we're ready to switch 
over.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145714)
Time Spent: 1h 40m  (was: 1.5h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145716
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218879686
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
+args 'checkout', 'asf-site'
+  }
+  copy {
+from contentBuildDir
+into contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'add', contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'commit', '-m', 'Update website ' + new Date().format('/MM/dd 
HH:mm:ss')
 
 Review comment:
   If it's easy to grab, it would be useful to also include the master commit 
hash in the commit message. (If it's not easy, then don't worry about it)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145716)
Time Spent: 2h  (was: 1h 50m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=145715&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145715
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:48
Start Date: 19/Sep/18 16:48
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r218876755
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
 
 Review comment:
   FYI, when I was previously working on the design, I looked into other ASF 
projects, and this is a common pattern. I believe some projects were using a 
Groovy helper library for git interactions. If we find invoking git manually to 
be tedious / flaky, we can look further into the helper library.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145715)
Time Spent: 1h 50m  (was: 1h 40m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=145723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145723
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 19/Sep/18 16:51
Start Date: 19/Sep/18 16:51
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6423: [BEAM-5417] Parity 
between GCS and local match
URL: https://github.com/apache/beam/pull/6423#issuecomment-422876831
 
 
   > @udim I see you touched `LocalFileSystem._list()` recently(ish) in 
[36a2506](https://github.com/apache/beam/commit/36a250615b4a5d72c80b03f8f87957b5c0aaf791).
 Is there a valid reason to not support `foo/*/file.txt` on filesystems with 
directories or was it just overlooked?
   
   Hi @joar. I don't remember why exactly, but it was a deliberate choice: 
https://github.com/apache/beam/blob/b7c2975c16c2a4d3051b0ea21ef5f515a0b8d50b/sdks/python/apache_beam/io/filesystem.py#L554.
   
   We used fnmatch.fnmatch() instead of glob.glob() to support both Unix and 
Windows style paths.
   To fully add support for paths like `foo/*/file.txt` (and 
`d:\foo\*\file.txt`):
   - Convert `*` in patterns to `[!/\]` (see fnmatch docs, I belive this should 
support both Unix and Windows paths)
   - Convert `**` in patterns to `*` (optional, this would be a new feature)
   - Document these changes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145723)
Time Spent: 40m  (was: 0.5h)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145726
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 17:00
Start Date: 19/Sep/18 17:00
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6349: [BEAM-2687] 
Implement State over the Fn API
URL: https://github.com/apache/beam/pull/6349#issuecomment-422880038
 
 
   This LGTM.  However, presubmits are failing because of lint:
   
   ```
   02:35:23 Running pylint for module apache_beam  gen_protos.py  setup.py  
test_config.py:
   02:35:23 Using config file 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/sdks/python/.pylintrc
   02:36:15 * Module apache_beam.runners.worker.bundle_processor
   02:36:15 C:168, 0: Line too long (86/80) (line-too-long)
   02:36:15 
   02:36:15 
   02:36:15 Your code has been rated at 10.00/10 (previous run: 10.00/10, -0.00)
   02:36:15 
   02:36:15 Command exited with non-zero status 16
   02:36:15 370.00user 9.62system 0:51.58elapsed 735%CPU (0avgtext+0avgdata 
319104maxresident)k
   02:36:15 0inputs+200outputs (0major+668358minor)pagefaults 0swaps
   02:36:15 ERROR: InvocationError for command '/usr/bin/time 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/sdks/python/scripts/run_pylint.sh'
 (exited with code 16)
   02:36:15 ___ summary 

   02:36:15 ERROR:   py27-lint: commands failed
   02:36:15 
   02:36:15 > Task :beam-sdks-python:lintPy27 FAILED
   02:36:15 :beam-sdks-python:lintPy27 (Thread[Task worker for ':' Thread 
5,5,main]) completed. Took 1 mins 10.419 secs.
   ```
   
   Please fix this and run postcommits before submitting. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145726)
Time Spent: 3h 20m  (was: 3h 10m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[beam] branch master updated (b7c2975 -> 38e3a81)

2018-09-19 Thread thw

This is an automated email from the ASF dual-hosted git repository.

thw pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from b7c2975  Merge pull request #5841: [BEAM-3446] Fixes RedisIO 
non-prefix read operations
 add 8f94a03  [BEAM-3089] Use Flink cluster parallelism if no parallelism 
provided
 add 33ac2b0  [flink] Revert default checkpointing mode to EXACTLY_ONCE
 add 9ea1120  [flink] Use default value for checkpoint timeout
 add 3f75e89  [flink] Set default master url to [auto]
 add 7444755  [BEAM-3089] Test default values of FlinkPipelineOptions
 new 38e3a81  Merge pull request #6426: [BEAM-3089] Fix default values in 
FlinkPipelineOptions / Add tests

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../runners/flink/DefaultParallelismFactory.java   |  39 -
 .../runners/flink/FlinkExecutionEnvironments.java  |  76 --
 .../apache/beam/runners/flink/FlinkJobInvoker.java |   5 +-
 .../beam/runners/flink/FlinkPipelineOptions.java   |  11 +-
 .../org/apache/beam/runners/flink/FlinkRunner.java |   5 -
 .../flink/FlinkExecutionEnvironmentsTest.java  | 162 +
 .../beam/runners/flink/PipelineOptionsTest.java|  27 
 .../flink/streaming/GroupByNullKeyTest.java|   2 +
 runners/flink/src/test/resources/flink-conf.yaml   |  19 +++
 9 files changed, 285 insertions(+), 61 deletions(-)
 delete mode 100644 
runners/flink/src/main/java/org/apache/beam/runners/flink/DefaultParallelismFactory.java
 create mode 100644 
runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkExecutionEnvironmentsTest.java
 create mode 100644 runners/flink/src/test/resources/flink-conf.yaml

[beam] 01/01: Merge pull request #6426: [BEAM-3089] Fix default values in FlinkPipelineOptions / Add tests

2018-09-19 Thread thw

This is an automated email from the ASF dual-hosted git repository.

thw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 38e3a8101a4c2269010fa2544be0acef5e740368
Merge: b7c2975 7444755
Author: Thomas Weise 
AuthorDate: Wed Sep 19 10:19:31 2018 -0700

Merge pull request #6426: [BEAM-3089] Fix default values in 
FlinkPipelineOptions / Add tests

 .../runners/flink/DefaultParallelismFactory.java   |  39 -
 .../runners/flink/FlinkExecutionEnvironments.java  |  76 --
 .../apache/beam/runners/flink/FlinkJobInvoker.java |   5 +-
 .../beam/runners/flink/FlinkPipelineOptions.java   |  11 +-
 .../org/apache/beam/runners/flink/FlinkRunner.java |   5 -
 .../flink/FlinkExecutionEnvironmentsTest.java  | 162 +
 .../beam/runners/flink/PipelineOptionsTest.java|  27 
 .../flink/streaming/GroupByNullKeyTest.java|   2 +
 runners/flink/src/test/resources/flink-conf.yaml   |  19 +++
 9 files changed, 285 insertions(+), 61 deletions(-)

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145731
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 17:19
Start Date: 19/Sep/18 17:19
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #6426: [BEAM-3089] Fix 
default values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/DefaultParallelismFactory.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/DefaultParallelismFactory.java
deleted file mode 100644
index d448bed2333..000
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/DefaultParallelismFactory.java
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.runners.flink;
-
-import org.apache.beam.sdk.options.DefaultValueFactory;
-import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.flink.configuration.ConfigConstants;
-import org.apache.flink.configuration.GlobalConfiguration;
-
-/**
- * {@link DefaultValueFactory} for getting a default value for the parallelism 
option on {@link
- * FlinkPipelineOptions}.
- *
- * This will return either the default value from {@link 
GlobalConfiguration} or {@code 1}. A
- * valid {@link GlobalConfiguration} is only available if the program is 
executed by the Flink run
- * scripts.
- */
-public class DefaultParallelismFactory implements DefaultValueFactory 
{
-  @Override
-  public Integer create(PipelineOptions options) {
-return GlobalConfiguration.loadConfiguration()
-.getInteger(ConfigConstants.DEFAULT_PARALLELISM_KEY, 1);
-  }
-}
diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
index 4ace1eccc37..40a8d51ee40 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
@@ -17,12 +17,16 @@
  */
 package org.apache.beam.runners.flink;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Splitter;
 import java.util.List;
+import javax.annotation.Nullable;
 import org.apache.flink.api.common.ExecutionConfig;
 import org.apache.flink.api.java.CollectionEnvironment;
 import org.apache.flink.api.java.ExecutionEnvironment;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.CoreOptions;
+import org.apache.flink.configuration.GlobalConfiguration;
 import org.apache.flink.configuration.RestOptions;
 import org.apache.flink.runtime.state.StateBackend;
 import org.apache.flink.streaming.api.TimeCharacteristic;
@@ -42,6 +46,12 @@
*/
   public static ExecutionEnvironment createBatchExecutionEnvironment(
   FlinkPipelineOptions options, List filesToStage) {
+return createBatchExecutionEnvironment(options, filesToStage, null);
+  }
+
+  @VisibleForTesting
+  static ExecutionEnvironment createBatchExecutionEnvironment(
+  FlinkPipelineOptions options, List filesToStage, @Nullable 
String confDir) {
 
 LOG.info("Creating a Batch Execution Environment.");
 
@@ -71,9 +81,18 @@ public static ExecutionEnvironment 
createBatchExecutionEnvironment(
 if (options.getParallelism() != -1 && !(flinkBatchEnv instanceof 
CollectionEnvironment)) {
   flinkBatchEnv.setParallelism(options.getParallelism());
 }
+// Set the correct parallelism, required by UnboundedSourceWrapper to 
generate consistent splits.
+final int parallelism;
+if (flinkBatchEnv instanceof CollectionEnvironment)

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #66

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
[...truncated 554.29 KB...]
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537377970.066969223","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>



==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537377962.55_9dc5c2c9-5a89-4d3a-afab-4abde7aff14b failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537377962.9_f44d3e9f-e0db-4303-9220-efc74eef530e failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537377963.37_5847e5e1-9dcb-4d98-b7e8-7c065f711fe4 failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537377963.85_850b5b4c-7ffc-430c-b857-74997d7a0ece 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537377964.35_fa5b3274-b981-4be7-8a01-3dd928cbc7dc failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pi

[jira] [Work logged] (BEAM-4780) Entry point for ULR JobService compatible with TestPortableRunner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4780?focusedWorklogId=145736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145736
 ]

ASF GitHub Bot logged work on BEAM-4780:


Author: ASF GitHub Bot
Created on: 19/Sep/18 17:38
Start Date: 19/Sep/18 17:38
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #6151: [BEAM-4780] Updating 
to DockerJobBundleFactory in ReferenceRunner.
URL: https://github.com/apache/beam/pull/6151#issuecomment-422893399
 
 
   Thanks Ben.
   
   @lukecwik for committer approval.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145736)
Time Spent: 1h 40m  (was: 1.5h)

> Entry point for ULR JobService compatible with TestPortableRunner
> -
>
> Key: BEAM-4780
> URL: https://issues.apache.org/jira/browse/BEAM-4780
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Eugene Kirpichov
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/pull/5935] introduces a TestPortableRunner 
> that can run ValidatesRunner tests against a given portable runner, 
> identified by a class name of a shim that can start/stop its JobService 
> endpoint.
> For ULR to run VR tests, it needs to provide such a shim.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #1544

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
[...truncated 762.05 MB...]
INFO: 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (13/16) (f7fd1e0b90a2e2d74a01da4a410127df) 
switched from RUNNING to FINISHED.
Sep 19, 2018 5:42:37 PM org.apache.flink.runtime.taskmanager.Task run
INFO: Freeing task resources for 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (15/16) (ec021a14e50a84d048377aa2163fa9b2).
Sep 19, 2018 5:42:37 PM org.apache.flink.runtime.taskmanager.Task run
INFO: Freeing task resources for 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (13/16) (f7fd1e0b90a2e2d74a01da4a410127df).
Sep 19, 2018 5:42:37 PM org.apache.flink.runtime.taskmanager.Task run
INFO: Ensuring all FileSystem streams are closed for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (15/16) (ec021a14e50a84d048377aa2163fa9b2) 
[FINISHED]
Sep 19, 2018 5:42:37 PM org.apache.flink.runtime.taskmanager.Task run
INFO: Freeing task resources for 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/Pa

Build failed in Jenkins: beam_PreCommit_Website_Cron #79

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[vin1990] Endpoint host port configuration reset issue

[vin1990] Endpoint host port configuration reset issue

[vin1990] Fixes https://issues.apache.org/jira/browse/BEAM-3446. BaseReadFn to

[vin1990] package private and typo fix

[vin1990] Using mget with configurable batch size to increase efficiency of read

[vin1990] fixing equality of batch size

[vin1990] fixing order of test arguments, to fix

[vin1990] Aggregating on per window, applyiong redis get on per window batch and

[timrobertson100] [BEAM-4861] Typo in comment

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
[...truncated 7.89 KB...]

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:assemble (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. Took 
0.0 secs.
:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 1.53 secs.
:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.0 secs.
:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.039 secs.
:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.0 secs.
:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. 
Took 0.0 secs.
:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.003 secs.
:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.003 secs.
:processTestResources (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no

[beam] branch master updated (38e3a81 -> 6d7cc57)

2018-09-19 Thread herohde

This is an automated email from the ASF dual-hosted git repository.

herohde pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 38e3a81  Merge pull request #6426: [BEAM-3089] Fix default values in 
FlinkPipelineOptions / Add tests
 add 633d457  Fix Go index parsing in the face of runner-gernerated keys
 add e5ef9b5  Fix use of wrong key in Dataflow side input
 new 6d7cc57  [BEAM-3286] Add Dataflow support for side input

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/pkg/beam/core/runtime/exec/sideinput.go|  2 +-
 sdks/go/pkg/beam/core/runtime/exec/translate.go| 48 +++---
 .../exec/translate_test.go}| 46 -
 .../beam/runners/dataflow/dataflowlib/translate.go |  2 +-
 4 files changed, 53 insertions(+), 45 deletions(-)
 copy sdks/go/pkg/beam/core/{funcx/output_test.go => 
runtime/exec/translate_test.go} (52%)

[beam] 01/01: [BEAM-3286] Add Dataflow support for side input

2018-09-19 Thread herohde

This is an automated email from the ASF dual-hosted git repository.

herohde pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 6d7cc57d88a422c2e8fbfdbc41bcbf68b73f4950
Merge: 38e3a81 e5ef9b5
Author: Henning Rohde 
AuthorDate: Wed Sep 19 11:02:59 2018 -0700

[BEAM-3286] Add Dataflow support for side input

 sdks/go/pkg/beam/core/runtime/exec/sideinput.go|  2 +-
 sdks/go/pkg/beam/core/runtime/exec/translate.go| 48 +--
 .../pkg/beam/core/runtime/exec/translate_test.go   | 56 ++
 .../beam/runners/dataflow/dataflowlib/translate.go |  2 +-
 4 files changed, 82 insertions(+), 26 deletions(-)

[jira] [Resolved] (BEAM-3286) Go SDK support for portable side input

2018-09-19 Thread Henning Rohde (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3286.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

> Go SDK support for portable side input
> --
>
> Key: BEAM-3286
> URL: https://issues.apache.org/jira/browse/BEAM-3286
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: 2.8.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3286) Go SDK support for portable side input

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3286?focusedWorklogId=145747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145747
 ]

ASF GitHub Bot logged work on BEAM-3286:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:03
Start Date: 19/Sep/18 18:03
Worklog Time Spent: 10m 
  Work Description: herohde closed pull request #6402: [BEAM-3286] Add 
Dataflow support for side input
URL: https://github.com/apache/beam/pull/6402
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/pkg/beam/core/runtime/exec/sideinput.go 
b/sdks/go/pkg/beam/core/runtime/exec/sideinput.go
index 4d4a247ef5f..5e55dc0d9f0 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/sideinput.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/sideinput.go
@@ -46,7 +46,7 @@ type sideInputAdapter struct {
 // It expects a W> coder, because the protocol supports MultiSet 
access only.
 func NewSideInputAdapter(sid StreamID, c *coder.Coder) SideInputAdapter {
if !coder.IsW(c) || !coder.IsKV(coder.SkipW(c)) {
-   panic(fmt.Sprintf("expected WKV coder for side input: %v", c))
+   panic(fmt.Sprintf("expected WKV coder for side input %v: %v", 
sid, c))
}
 
wc := MakeWindowEncoder(c.Window)
diff --git a/sdks/go/pkg/beam/core/runtime/exec/translate.go 
b/sdks/go/pkg/beam/core/runtime/exec/translate.go
index 696297cea67..2829b25e3e6 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/translate.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/translate.go
@@ -28,6 +28,7 @@ import (
"github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1"
"github.com/apache/beam/sdks/go/pkg/beam/core/typex"
"github.com/apache/beam/sdks/go/pkg/beam/core/util/protox"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/util/stringx"
fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
pb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1"
"github.com/golang/protobuf/proto"
@@ -515,44 +516,43 @@ func (b *builder) makeLink(from string, id linkID) (Node, 
error) {
 
 // unmarshalKeyedValues converts a map {"i1": "b", ""i0": "a"} into an ordered 
list of
 // of values: {"a", "b"}. If the keys are not in the expected format, the 
returned
-// list does not guarantee any order.
+// list does not guarantee any order, but will respect ordered values.
 func unmarshalKeyedValues(m map[string]string) []string {
if len(m) == 0 {
return nil
}
+   if len(m) == 1 && stringx.Keys(m)[0] == "bogus" {
+   return nil // Ignore special bogus node for legacy Dataflow.
+   }
 
// (1) Compute index. If generated by the marshaller, we have
// a "iN" name that directly indicates the position.
 
-   index := make(map[string]int)
-   complete := true
+   ordered := make(map[int]string)
+   var unordered []string
 
for key := range m {
-   if i, err := strconv.Atoi(strings.TrimPrefix(key, "i")); 
!strings.HasPrefix(key, "i") || err != nil {
-   complete = false
-   break
-   } else {
-   index[key] = i
-   }
+   if i, err := strconv.Atoi(strings.TrimPrefix(key, "i")); 
strings.HasPrefix(key, "i") && err == nil {
+   if i < len(m) {
+   ordered[i] = key
+   continue
+   } // else: out-of-range index.
+   } // else: not in "iN" form.
+
+   unordered = append(unordered, key)
}
 
-   // (2) Impose order, if present, on values.
-
-   if !complete {
-   // Inserted node or fallback. Assume any order is ok.
-   var ret []string
-   for key, value := range m {
-   if key == "bogus" {
-   continue // Ignore special bogus node for 
legacy Dataflow.
-   }
-   ret = append(ret, value)
-   }
-   return ret
-   }
+   // (2) Impose order, to the extent present, on values.
 
ret := make([]string, len(m))
-   for key, value := range m {
-   ret[index[key]] = value
+   k := 0
+   for i := 0; i < len(ret); i++ {
+   if key, ok := ordered[i]; ok {
+   ret[i] = m[key]
+   } else {
+   ret[i] = m[unordered[k]]
+   k++
+   }
}
return ret
 }
diff --git a/sdks/go/pkg/beam/core/runtime/exec/translate_test.go 
b/sdks/go/pkg/beam/core

Build failed in Jenkins: beam_PerformanceTests_AvroIOIT_HDFS #678

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[vin1990] Endpoint host port configuration reset issue

[vin1990] Endpoint host port configuration reset issue

[vin1990] Fixes https://issues.apache.org/jira/browse/BEAM-3446. BaseReadFn to

[vin1990] package private and typo fix

[vin1990] Using mget with configurable batch size to increase efficiency of read

[vin1990] fixing equality of batch size

[vin1990] fixing order of test arguments, to fix

[vin1990] Aggregating on per window, applyiong redis get on per window batch and

[herohde] Fix Go index parsing in the face of runner-gernerated keys

[herohde] Fix use of wrong key in Dataflow side input

[timrobertson100] [BEAM-4861] Typo in comment

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam7 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 6d7cc57d88a422c2e8fbfdbc41bcbf68b73f4950 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6d7cc57d88a422c2e8fbfdbc41bcbf68b73f4950
Commit message: "[BEAM-3286] Add Dataflow support for side input"
 > git rev-list --no-walk c49a97ecbf815b320926285dcddba993590e3073 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_AvroIOIT_HDFS] $ /bin/bash -xe 
/tmp/jenkins9120965464369815147.sh
+ gcloud container clusters get-credentials io-datastores --zone=us-central1-a 
--verbosity=debug
DEBUG: (gcloud.container.clusters.get-credentials) timed out
This may be due to network connectivity issues. Please check your network 
settings, and the status of the service you are trying to reach.
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 
797, in Execute
resources = calliope_command.Run(cli=self, args=args)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 
748, in Run
self._parent_group.RunGroupFilter(tool_context, args)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 
689, in RunGroupFilter
self._parent_group.RunGroupFilter(context, args)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 
690, in RunGroupFilter
self._common_type().Filter(context, args)
  File "/usr/lib/google-cloud-sdk/lib/surface/container/__init__.py", line 71, 
in Filter
context['api_adapter'] = api_adapter.NewAPIAdapter('v1')
  File 
"/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py",
 line 143, in NewAPIAdapter
return NewV1APIAdapter()
  File 
"/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py",
 line 147, in NewV1APIAdapter
return InitAPIAdapter('v1', V1Adapter)
  File 
"/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py",
 line 168, in InitAPIAdapter
api_client = core_apis.GetClientInstance('container', api_version)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/apis.py", 
line 295, in GetClientInstance
api_name, api_version, no_http, _CheckResponse, enable_resource_quota)
  File 
"/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/apis_internal.py", 
line 149, in _GetClientInstance
http_client = http.Http(enable_resource_quota=enable_resource_quota)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/credentials/http.py", 
line 62, in Http
creds = store.LoadIfEnabled()
  File 
"/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 
278, in LoadIf

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #67

2018-09-19 Thread Apache Jenkins Server

See 


--
[...truncated 554.51 KB...]
raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537380361.560169747","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>


==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537380353.87_ae2076e5-621f-462a-b5ba-f7b9fc914f05 failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537380354.23_d30f7de1-43be-4f28-8e5b-a956c8181b70 failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537380354.7_b745f626-a99c-4539-bcbd-344aefbbfc5d failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537380355.17_9209e5a9-735f-486e-af98-c3471fb2fc6e 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537380355.66_662eb18a-446a-4aca-8231-e265a79036d2 failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_group_by_key_1537380356.14_8219e308-27f3-4a71-9842-199a4e

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1501

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[timrobertson100] [BEAM-4861] Typo in comment

--
[...truncated 24.87 MB...]
at 
com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:797)
at 
com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:794)
at 
com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:227)
at 
com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:793)
at 
com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:319)
at 
com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:60)
at 
org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteToSpannerFn.processElement(SpannerIO.java:1103)
at 
org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteToSpannerFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275)
at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240)
at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
at 
org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:207)
at 
org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:55)
at 
org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
at 
org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Value must not be NULL in 
table users.
at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:479)
at 
com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:450)
... 21 more
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Value must 
not be NULL in table users.
at io.grpc.Status.asRuntimeException(Status.java:526)
at 
io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:468)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose(SpannerErrorInterceptor.java:100)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
com.google.cloud.spanner.spi.v1.WatchdogInterceptor$MonitoredCall$1.onClose(WatchdogInterceptor.java:190)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClos

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #68

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[herohde] Fix Go index parsing in the face of runner-gernerated keys

[herohde] Fix use of wrong key in Dataflow side input

--
[...truncated 553.39 KB...]
raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537380746.564796125","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>


==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537380738.83_e3da1a48-cf63-4fef-b559-9203eae9709b failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537380739.18_c1f6a967-9386-42da-865c-5be4198c39f9 failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537380739.65_e7a3fa53-435e-4249-bc5c-8e244e1fa461 failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537380740.13_40a0433a-3ffe-43ae-aab3-361491ae46f8 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537380740.64_7d4d12c5-5061-4edd-9a8a-2e7d7ed36464 failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'P

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle #1078

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[timrobertson100] [BEAM-4861] Typo in comment

--
[...truncated 19.13 MB...]
INFO: Adding 
View.AsSingleton/Combine.GloballyAsSingletonView/CreateDataflowView as step s9
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding Create123/Read(CreateSource) as step s10
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding OutputSideInputs as step s11
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Window.Into()/Window.Assign as step 
s12
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous) as step 
s13
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map 
as step s14
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign as step 
s15
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/GroupByKey as step 
s16
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/Values/Values/Map as 
step s17
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/RewindowActuals/Window.Assign as step 
s18
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/KeyForDummy/AddKeys/Map as step s19
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/RemoveActualsTriggering/Flatten.PCollections as step 
s20
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Create.Values/Read(CreateSource) as 
step s21
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/WindowIntoDummy/Window.Assign as step 
s22
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/RemoveDummyTriggering/Flatten.PCollections as step s23
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/FlattenDummyAndContents as step s24
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/NeverTrigger/Flatten.PCollections as 
step s25
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GroupDummyAndContents as step s26
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Values/Values/Map as step s27
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/ParDo(Concat) as step s28
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GetPane/Map as step s29
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/RunChecks as step s30
Sep 19, 2018 6:12:37 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/VerifyAssertions/ParDo(DefaultConclude) as step s31
Sep 19, 2018 6:12:37 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Staging pipeline description to 
gs://temp-storage-for-validates-runner-tests//viewtest0testsingletonsideinput-jenkins-0919181231-d8760bd/output/results/staging/
Sep 19, 2018 6:12:37 PM org.apache.beam.runners.dataflow.util.PackageUtil 
tryStagePackage

[jira] [Work logged] (BEAM-3089) Issue with setting the parallelism at client level using Flink runner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3089?focusedWorklogId=145752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145752
 ]

ASF GitHub Bot logged work on BEAM-3089:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:16
Start Date: 19/Sep/18 18:16
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6426: [BEAM-3089] Fix default 
values in FlinkPipelineOptions / Add tests
URL: https://github.com/apache/beam/pull/6426#issuecomment-422906216
 
 
   @tweise As far as I see we have all the checkpointing related options 
already exposed. The problem is indeed that there might be more options that 
users want to configure. It could make sense to expose an interface to 
configure all Flink options.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145752)
Time Spent: 3h  (was: 2h 50m)

> Issue with setting the parallelism at client level using Flink runner
> -
>
> Key: BEAM-3089
> URL: https://issues.apache.org/jira/browse/BEAM-3089
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.0.0
> Environment: I am using Flink 1.2.1 running on Docker, with Task 
> Managers distributed across different VMs as part of a Docker Swarm.
>Reporter: Thalita Vergilio
>Assignee: Grzegorz Kołakowski
>Priority: Major
>  Labels: docker, flink, parallel-deployment
> Fix For: 2.8.0
>
> Attachments: flink-ui-parallelism.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When uploading an Apache Beam application using the Flink Web UI, the 
> parallelism set at job submission doesn't get picked up. The same happens 
> when submitting a job using the Flink CLI.
> In both cases, the parallelism ends up defaulting to 1.
> When I set the parallelism programmatically within the Apache Beam code, it 
> works: {{flinkPipelineOptions.setParallelism(4);}}
> I suspect the root of the problem may be in the 
> org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks 
> for Flink's GlobalConfiguration, which may not pick up runtime values passed 
> to Flink, then defaults to 1 if it doesn't find anything.
> Any ideas on how this could be fixed or worked around? I need to be able to 
> change the parallelism dynamically, so the programmatic approach won't really 
> work for me, nor will setting the Flink configuration at system level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Go_GradleBuild #985

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
[...truncated 538.45 KB...]
2018/09/19 18:16:42 Plan[plan]:
12: Impulse[0]
13: Impulse[0]
1: ParDo[passert.failFn] Out:[]
2: Discard
3: ParDo[passert.failFn] Out:[]
4: ParDo[passert.diffFn] Out:[1 2 3]
5: wait[2] Out:4
6: buffer[6]. wait:5 Out:4
7: buffer[7]. wait:5 Out:4
8: Flatten[7]. Out:buffer[6]. wait:5 Out:4
9: ParDo[beam.partitionFn] Out:[8 8 8 8 8 8 8]
10: Multiplex. Out:[9 7]
11: ParDo[beam.createFn] Out:[10]
2018/09/19 18:16:42 wait[5] unblocked w/ 1 [false]
2018/09/19 18:16:42 wait[5] done
--- PASS: TestPartitionFlattenIdentity (0.00s)
=== RUN   Example_metricsDeclaredAnywhere
--- PASS: Example_metricsDeclaredAnywhere (0.00s)
=== RUN   Example_metricsReusable
--- PASS: Example_metricsReusable (0.00s)
PASS
coverage: 44.8% of statements
ok  github.com/apache/beam/sdks/go/pkg/beam 0.015s  coverage: 44.8% of 
statements
=== RUN   TestOptions
--- PASS: TestOptions (0.00s)
=== RUN   TestKey
--- PASS: TestKey (0.00s)
=== RUN   TestRegister
--- PASS: TestRegister (0.00s)
PASS
coverage: 47.1% of statements
ok  github.com/apache/beam/sdks/go/pkg/beam/core/runtime0.003s  
coverage: 47.1% of statements

> Task :beam-sdks-go:test
Test for github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness finished, 
1 completed, 0 failed
Result of package github.com/apache/beam/sdks/go/pkg/beam:
Test for github.com/apache/beam/sdks/go/pkg/beam finished, 7 completed, 0 failed
Result of package github.com/apache/beam/sdks/go/pkg/beam/core/runtime:
Test for github.com/apache/beam/sdks/go/pkg/beam/core/runtime finished, 3 
completed, 0 failed
Result of package github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx:

=== RUN   TestMergeMaps
--- PASS: TestMergeMaps (0.00s)
=== RUN   TestShallowClone
--- PASS: TestShallowClone (0.00s)
=== RUN   TestShallowCloneNil
--- PASS: TestShallowCloneNil (0.00s)
PASS
coverage: 6.4% of statements
ok  github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx  0.004s  
coverage: 6.4% of statements

> Task :beam-sdks-go:test
Test for github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx finished, 3 
completed, 0 failed
Generating HTML test report...
Finished generating test html results (0.119 secs) into: 

:beam-sdks-go:test (Thread[Task worker for ':' Thread 7,5,main]) completed. 
Took 27.196 secs.
:beam-sdks-go-container:prepare (Thread[Task worker for ':' Thread 7,5,main]) 
started.

> Task :beam-sdks-go-container:prepare
Caching disabled for task ':beam-sdks-go-container:prepare': Caching has not 
been enabled for the task
Task ':beam-sdks-go-container:prepare' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Use project GOPATH: 

:beam-sdks-go-container:prepare (Thread[Task worker for ':' Thread 7,5,main]) 
completed. Took 0.001 secs.
:beam-sdks-go-container:resolveBuildDependencies (Thread[Task worker for ':' 
Thread 7,5,main]) started.

> Task :beam-sdks-go-container:resolveBuildDependencies UP-TO-DATE
Build cache key for task ':beam-sdks-go-container:resolveBuildDependencies' is 
cdc6da84ef15e59cab3378ce2ab5a6d0
Caching disabled for task ':beam-sdks-go-container:resolveBuildDependencies': 
Caching has not been enabled for the task
Skipping task ':beam-sdks-go-container:resolveBuildDependencies' as it is 
up-to-date.
:beam-sdks-go-container:resolveBuildDependencies (Thread[Task worker for ':' 
Thread 7,5,main]) completed. Took 0.033 secs.
:beam-sdks-go-container:installDependencies (Thread[Task worker for ':' Thread 
7,5,main]) started.

> Task :beam-sdks-go-container:installDependencies
Caching disabled for task ':beam-sdks-go-container:installDependencies': 
Caching has not been enabled for the task
Task ':beam-sdks-go-container:installDependencies' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
:beam-sdks-go-container:installDependencies (Thread[Task worker for ':' Thread 
7,5,main]) completed. Took 0.711 secs.
:beam-sdks-go-container:buildLinuxAmd64 (Thread[Task worker for ':' Thread 
7,5,main]) started.

> Task :beam-sdks-go-container:buildLinuxAmd64 UP-TO-DATE
Build cache key for task ':beam-sdks-go-container:buildLinuxAmd64' is 
2751989ee3d8907a0d173c98e97f36b6
Caching disabled for task ':beam-sdks-go-container:buildLinuxAmd64': Caching 
has not been enabled for the task
Skipping task ':beam-sdks-

Build failed in Jenkins: beam_PostCommit_Java_Nexmark_Dataflow #438

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
[...truncated 1.27 MB...]
2018-09-19T18:17:47.898Z DONE Query12

==
Run started 2018-09-19T17:23:21.974Z and ran for PT3265.961S

Default configuration:
{"debug":true,"query":0,"sourceType":"DIRECT","sinkType":"DEVNULL","exportSummaryToBigQuery":false,"pubSubMode":"COMBINED","numEvents":10,"numEventGenerators":100,"rateShape":"SINE","firstEventRate":1,"nextEventRate":1,"rateUnit":"PER_SECOND","ratePeriodSec":600,"preloadSeconds":0,"streamTimeout":240,"isRateLimited":false,"useWallclockEventTime":false,"avgPersonByteSize":200,"avgAuctionByteSize":500,"avgBidByteSize":100,"hotAuctionRatio":2,"hotSellersRatio":4,"hotBiddersRatio":4,"windowSizeSec":10,"windowPeriodSec":5,"watermarkHoldbackSec":0,"numInFlightAuctions":100,"numActivePeople":1000,"coderStrategy":"HAND","cpuDelayMs":0,"diskBusyBytes":0,"auctionSkip":123,"fanout":5,"maxAuctionsWaitingTime":600,"occasionalDelaySec":3,"probDelayedEvent":0.1,"maxLogEvents":10,"usePubsubPublishTime":false,"outOfOrderGroupSize":1}

Configurations:
  Conf  Description
    query:0; exportSummaryToBigQuery:true; numEvents:1000
  0001  query:1; exportSummaryToBigQuery:true; numEvents:1000
  0002  query:2; exportSummaryToBigQuery:true; numEvents:1000
  0003  query:3; exportSummaryToBigQuery:true; numEvents:1000
  0004  query:4; exportSummaryToBigQuery:true; numEvents:100
  0005  query:5; exportSummaryToBigQuery:true; numEvents:1000
  0006  query:6; exportSummaryToBigQuery:true; numEvents:100
  0007  query:7; exportSummaryToBigQuery:true; numEvents:1000
  0008  query:8; exportSummaryToBigQuery:true; numEvents:1000
  0009  query:9; exportSummaryToBigQuery:true; numEvents:100
  0010  query:10; exportSummaryToBigQuery:true; numEvents:1000
  0011  query:11; exportSummaryToBigQuery:true; numEvents:1000
  0012  query:12; exportSummaryToBigQuery:true; numEvents:1000

Performance:
  Conf  Runtime(sec)(Baseline)  Events(/sec)(Baseline)   Results
(Baseline)
    59.6167897.91000
  
  0001  -1.0-1.0  -1
  
*** Job was unexpectedly updated ***
  0002  27.6362594.7   83527
  
  0003  36.3275178.9   59507
  
  0004  39.2 25516.7 140
  
  0005  70.8141250.94567
  
  0006  39.2 25525.28902
  
  0007  78.6127179.5 100
  
  0008  41.7240067.2   92833
  
  0009  41.6 24059.3   32548
  
  0010  -1.0-1.0  -1
  
*** Job was unexpectedly updated ***
  0011  54.2184481.4 1839657
  
  0012 143.8 69564.7  199929
  
==

Sep 19, 2018 6:17:48 PM 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory
 create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Sep 19, 2018 6:17:48 PM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from 
the classpath: will stage 127 files. Enable logging at DEBUG level to see which 
files will be staged.
Sep 19, 2018 6:17:48 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing 
implications related to Google Compute Engine usage and other Google Cloud 
Services.
Sep 19, 2018 6:17:48 PM org.apache.beam.runners.dataflow.util.PackageUtil 
stageClasspathElements
INFO: Uploading 127 files from PipelineOptions.filesToStage to staging location 
to prepare for execution.
Sep 19, 2018 6:17:48 PM org.apache.beam.runners.dataflow.util.PackageUtil 
stageClasspathElements
INFO: Staging files complete: 127 files cached, 0 files newly uploa

Build failed in Jenkins: beam_PerformanceTests_Python #1457

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[timrobertson100] [BEAM-4861] Autocreate absent parent directories when doing 
an HDFS

[timrobertson100] [BEAM-4861] Improve clarity around exceptions thrown in rename

[vin1990] Endpoint host port configuration reset issue

[vin1990] Endpoint host port configuration reset issue

[vin1990] Fixes https://issues.apache.org/jira/browse/BEAM-3446. BaseReadFn to

[vin1990] package private and typo fix

[vin1990] Using mget with configurable batch size to increase efficiency of read

[vin1990] fixing equality of batch size

[vin1990] fixing order of test arguments, to fix

[vin1990] Aggregating on per window, applyiong redis get on per window batch and

[herohde] Fix Go index parsing in the face of runner-gernerated keys

[herohde] Fix use of wrong key in Dataflow side input

[timrobertson100] [BEAM-4861] Typo in comment

[mxm] [BEAM-3089] Use Flink cluster parallelism if no parallelism provided

[mxm] [flink] Revert default checkpointing mode to EXACTLY_ONCE

[mxm] [flink] Use default value for checkpoint timeout

[mxm] [flink] Set default master url to [auto]

[mxm] [BEAM-3089] Test default values of FlinkPipelineOptions

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam15 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 6d7cc57d88a422c2e8fbfdbc41bcbf68b73f4950 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6d7cc57d88a422c2e8fbfdbc41bcbf68b73f4950
Commit message: "[BEAM-3286] Add Dataflow support for side input"
 > git rev-list --no-walk c49a97ecbf815b320926285dcddba993590e3073 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins7076482636507675621.sh
+ rm -rf 

[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6190556130135514082.sh
+ rm -rf 

[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins7458117137086239896.sh
+ virtualenv 

New python executable in 

Also creating executable in 

Installing setuptools, pkg_resources, pip, wheel...done.
Running virtualenv with interpreter /usr/bin/python2
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins7392621078541491404.sh
+ 

 install --upgrade setuptools pip
Requirement already up-to-date: setuptools in 
./env/.perfkit_env/lib/python2.7/site-packages (40.4.1)
Requirement already up-to-date: pip in 
./env/.perfkit_env/lib/python2.7/site-packages (18.0)
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1040844573174552279.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git 

Cloning into 
'
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins7974748427869100884.sh
+ 

 install -r 

Collecting absl-py (from -r 

 (line 14))
Collecting jinja2>=2.7 (from -r 

 (line 15))
  Using cached 
https://files.pythonhosted.org/packages/7f/ff/ae64bacdfc95f27a016a7bed8e8686763ba4d27

[jira] [Assigned] (BEAM-5382) Combiner panics at runtime if MergeAccumulators has a context parameter

2018-09-19 Thread Cody Schroeder (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Schroeder reassigned BEAM-5382:


Assignee: Cody Schroeder

> Combiner panics at runtime if MergeAccumulators has a context parameter
> ---
>
> Key: BEAM-5382
> URL: https://issues.apache.org/jira/browse/BEAM-5382
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Cody Schroeder
>Assignee: Cody Schroeder
>Priority: Major
>
> [combine.go#L62|https://github.com/apache/beam/blob/14ef23c/sdks/go/pkg/beam/core/runtime/exec/combine.go#L62]
>  assumes that a combiner's {{MergeAccumulators}} function must be 2x1 but 
> {{TryCombinePerKey}} accepts combiner functions with context parameters.  I 
> believe accepting context parameters is the correct behavior overall.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #1545

2018-09-19 Thread Apache Jenkins Server

See 


--
[...truncated 766.86 MB...]
INFO: 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (12/16) (9973c4c4aa4f01153e29c858ec68b47f) 
switched from RUNNING to FINISHED.
Sep 19, 2018 6:25:24 PM org.apache.flink.runtime.taskmanager.Task run
INFO: Ensuring all FileSystem streams are closed for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (6/16) (63fc5a176c23123367329e0fe6c10d59) 
[FINISHED]
Sep 19, 2018 6:25:24 PM org.apache.flink.runtime.taskexecutor.TaskExecutor 
unregisterTaskAndNotifyFinalState
INFO: Un-registering task and sending final execution state FINISHED to 
JobManager for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) 649c865235beb1932e0af2f62f5730e6.
Sep 19, 2018 6:25:24 PM org.apache.flink.runtime.executiongraph.Execution 
transitionState
INFO: 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (2/16) (c6bd70eedd1067bd7afa0f9dcd318325) 
switched from RUNNING to FINISHED.
Sep 19, 2018 6:25:24 PM org.apache.flink.runtime.taskexecutor.TaskExecutor 
unregisterTaskAndNotifyFinalState
INFO: Un-registering task and sending final execution state FINISHED to 
JobManager for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/M

Jenkins build is back to normal : beam_PostCommit_Go_GradleBuild #986

2018-09-19 Thread Apache Jenkins Server

See

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=145756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145756
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:30
Start Date: 19/Sep/18 18:30
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6433: [BEAM-2687] 
Implement Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#issuecomment-422910417
 
 
   Please rebase after https://github.com/apache/beam/pull/6349 is merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145756)
Time Spent: 3.5h  (was: 3h 20m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5382) Combiner panics at runtime if MergeAccumulators has a context parameter

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5382?focusedWorklogId=145757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145757
 ]

ASF GitHub Bot logged work on BEAM-5382:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:31
Start Date: 19/Sep/18 18:31
Worklog Time Spent: 10m 
  Work Description: schroederc opened a new pull request #6434: [BEAM-5382] 
Add fallback for non-binary MergeAccumulatorsFn
URL: https://github.com/apache/beam/pull/6434
 
 
   Allow non-binary MergeAccumulators functions (e.g. context parameters and 
error return values) while maintaining the existing fast path for binary 
functions.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145757)
Time Spent: 10m
Remaining Estimate: 0h

> Combiner panics at runtime if MergeAccumulators has a context parameter
> ---
>
> Key: BEAM-5382
> URL: https://issues.apache.org/jira/browse/BEAM-5382
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Cody Schroeder
>Assignee: Cody Schroeder
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [combine.go#L62|https://github.com/apache/beam/blob/14ef23c/sdks/go/pkg/beam/core/runtime/exec/combine.go#L62]
>  assumes that a combiner's {{MergeAccumulators}} function must be 2x1 but 
> {{TryCombinePerKey}} accepts combiner functions with context parameters.  I 
> believe accepting context parameters is the correct behavior overall.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5382) Combiner panics at runtime if MergeAccumulators has a context parameter

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5382?focusedWorklogId=145759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145759
 ]

ASF GitHub Bot logged work on BEAM-5382:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:32
Start Date: 19/Sep/18 18:32
Worklog Time Spent: 10m 
  Work Description: schroederc commented on issue #6434: [BEAM-5382] Add 
fallback for non-binary MergeAccumulatorsFn
URL: https://github.com/apache/beam/pull/6434#issuecomment-422911177
 
 
   R: @lostluck


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145759)
Time Spent: 20m  (was: 10m)

> Combiner panics at runtime if MergeAccumulators has a context parameter
> ---
>
> Key: BEAM-5382
> URL: https://issues.apache.org/jira/browse/BEAM-5382
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Cody Schroeder
>Assignee: Cody Schroeder
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [combine.go#L62|https://github.com/apache/beam/blob/14ef23c/sdks/go/pkg/beam/core/runtime/exec/combine.go#L62]
>  assumes that a combiner's {{MergeAccumulators}} function must be 2x1 but 
> {{TryCombinePerKey}} accepts combiner functions with context parameters.  I 
> believe accepting context parameters is the correct behavior overall.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=145764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145764
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:45
Start Date: 19/Sep/18 18:45
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #5842: 
[BEAM-1251] Modernize Python 2 code to get ready for Python 3
URL: https://github.com/apache/beam/pull/5842#discussion_r218921060
 
 

 ##
 File path: website/.jenkins/append_index_html_to_internal_links.py
 ##
 @@ -29,12 +29,18 @@
   'sudo apt-get install python-beautifulsoup4'.
 
 """
+from __future__ import print_function
 
 Review comment:
   The website source code is currently being migrated from 
https://github.com/apache/beam-site, but is not yet ready. Website changes in 
apache/beam will be overwritten on next merge. Please contribute changes at 
apache/beam-site according to the [website contribution 
guide](https://beam.apache.org/contribute/#contributing-to-the-website). You 
can track migration progress via 
[[BEAM-4493]](https://issues.apache.org/jira/browse/BEAM-4493).
   
   Do you know if this change was also made in the apache/beam-site repository? 
If not, it will need to be migrated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 145764)
Time Spent: 20h 20m  (was: 20h 10m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #1546

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[herohde] Fix Go index parsing in the face of runner-gernerated keys

[herohde] Fix use of wrong key in Dataflow side input

--
[...truncated 761.06 MB...]
INFO: Ensuring all FileSystem streams are closed for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (5/16) (768e9e5bd77b2b5addee60da3ba9b22e) 
[FINISHED]
Sep 19, 2018 6:46:43 PM org.apache.flink.runtime.executiongraph.Execution 
transitionState
INFO: 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (12/16) (bceec6f23934d0a3470883d133d1b1c3) 
switched from RUNNING to FINISHED.
Sep 19, 2018 6:46:43 PM org.apache.flink.runtime.taskexecutor.TaskExecutor 
unregisterTaskAndNotifyFinalState
INFO: Un-registering task and sending final execution state FINISHED to 
JobManager for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) 776155226b7dd90c08abfe6ceeb41b19.
Sep 19, 2018 6:46:43 PM org.apache.flink.runtime.executiongraph.Execution 
transitionState
INFO: 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/Reify.ExtractTimestampsFromValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Values/Values/Map/ParMultiDo(Anonymous)
 -> GenerateSequence/Read(UnboundedCountingSource)/Read/ParMultiDo(Read) -> 
GenerateSequence/Read(UnboundedCountingSource)/StripIds/ParMultiDo(StripIds) -> 
ParDo(Counting)/ParMultiDo(Counting) (15/16) (3319dd60b8c7fc6d074a8e1cb338b1c0) 
switched from RUNNING to FINISHED.
Sep 19, 2018 6:46:43 PM org.apache.flink.runtime.taskexecutor.TaskExecutor 
unregisterTaskAndNotifyFinalState
INFO: Un-registering task and sending final execution state FINISHED to 
JobManager for task 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/GroupByKey 
-> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/ExpandIterable/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/RestoreOriginalTimestamps/ReifyTimestamps.RemoveWildcard/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
GenerateSequence/Read(UnboundedCountingSource)/Reshuffle/Reshuffle/Restor

[beam] branch master updated (6d7cc57 -> 3a1e198)

2018-09-19 Thread lcwik

This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6d7cc57  [BEAM-3286] Add Dataflow support for side input
 add 6c4c3da  [BEAM-4780] Updating to DockerJobBundleFactory in 
ReferenceRunner.
 add 3a1e198  [BEAM-4780] Updating to DockerJobBundleFactory in 
ReferenceRunner.

No new revisions were added by this update.

Summary of changes:
 .../runners/direct/portable/ReferenceRunner.java   | 23 +-
 1 file changed, 22 insertions(+), 1 deletion(-)

[jira] [Work logged] (BEAM-4780) Entry point for ULR JobService compatible with TestPortableRunner

2018-09-19 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4780?focusedWorklogId=145765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145765
 ]

ASF GitHub Bot logged work on BEAM-4780:


Author: ASF GitHub Bot
Created on: 19/Sep/18 18:46
Start Date: 19/Sep/18 18:46
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #6151: [BEAM-4780] 
Updating to DockerJobBundleFactory in ReferenceRunner.
URL: https://github.com/apache/beam/pull/6151
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
index 079c0d94fa4..ab62c653873 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java
@@ -69,6 +69,7 @@
 import org.apache.beam.runners.fnexecution.ServerFactory;
 import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
 import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import org.apache.beam.runners.fnexecution.control.DockerJobBundleFactory;
 import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
 import org.apache.beam.runners.fnexecution.control.JobBundleFactory;
 import org.apache.beam.runners.fnexecution.control.MapControlClientPool;
@@ -79,6 +80,7 @@
 import 
org.apache.beam.runners.fnexecution.environment.InProcessEnvironmentFactory;
 import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
 import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
 import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
 import org.apache.beam.runners.fnexecution.state.GrpcStateService;
 import org.apache.beam.runners.fnexecution.wire.LengthPrefixUnknownCoders;
@@ -160,6 +162,7 @@ public void execute() throws Exception {
 ServerFactory serverFactory = createServerFactory();
 ControlClientPool controlClientPool = MapControlClientPool.create();
 ExecutorService dataExecutor = Executors.newCachedThreadPool();
+// TODO(BEAM-5095): Fill out ProvisionInfo/JobInfo properly, without 
placeholder values.
 ProvisionInfo provisionInfo =
 ProvisionInfo.newBuilder()
 .setJobId("id")
@@ -168,6 +171,7 @@ public void execute() throws Exception {
 .setWorkerId("foo")
 .setResourceLimits(Resources.getDefaultInstance())
 .build();
+JobInfo jobInfo = JobInfo.create("id", "reference", "retrieval-token", 
options);
 try (GrpcFnServer logging =
 GrpcFnServer.allocatePortAndCreateFor(
 GrpcLoggingService.forWriter(Slf4jLogWriter.getDefault()), 
serverFactory);
@@ -198,7 +202,7 @@ public void execute() throws Exception {
   createEnvironmentFactory(
   control, logging, artifact, provisioning, 
controlClientPool.getSource());
   JobBundleFactory jobBundleFactory =
-  SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, 
data, state);
+  createJobBundleFactory(jobInfo, environmentFactory, data, state);
 
   TransformEvaluatorRegistry transformRegistry =
   TransformEvaluatorRegistry.portableRegistry(
@@ -253,6 +257,23 @@ private EnvironmentFactory createEnvironmentFactory(
 }
   }
 
+  private JobBundleFactory createJobBundleFactory(
+  JobInfo jobInfo,
+  EnvironmentFactory environmentFactory,
+  GrpcFnServer data,
+  GrpcFnServer state)
+  throws Exception {
+switch (environmentType) {
+  case DOCKER:
+return DockerJobBundleFactory.create(jobInfo);
+  case IN_PROCESS:
+return 
SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, data, 
state);
+  default:
+throw new IllegalArgumentException(
+String.format("Unknown %s %s", 
EnvironmentType.class.getSimpleName(), environmentType));
+}
+  }
+
   @VisibleForTesting
   static class PortableGroupByKeyReplacer implements TransformReplacement {
 @Override


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 1457

Build failed in Jenkins: beam_PostCommit_Java_Nexmark_Spark #510

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[daniel.o.programmer] [BEAM-4780] Updating to DockerJobBundleFactory in 
ReferenceRunner.

--
[...truncated 124.96 KB...]
All input files are considered out-of-date for incremental task 
':beam-sdks-java-fn-execution:compileJava'.
Compiling with error-prone compiler
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-sdks-java-fn-execution:compileJava'
:beam-sdks-java-fn-execution:compileJava (Thread[Task worker for ':' Thread 
10,5,main]) completed. Took 8.031 secs.
:beam-sdks-java-fn-execution:classes (Thread[Task worker for ':' Thread 
10,5,main]) started.

> Task :beam-sdks-java-fn-execution:classes
Skipping task ':beam-sdks-java-fn-execution:classes' as it has no actions.
:beam-sdks-java-fn-execution:classes (Thread[Task worker for ':' Thread 
10,5,main]) completed. Took 0.001 secs.
:beam-sdks-java-fn-execution:shadowJar (Thread[Task worker for ':' Thread 
10,5,main]) started.

> Task :beam-sdks-java-fn-execution:shadowJar
Build cache key for task ':beam-sdks-java-fn-execution:shadowJar' is 
2fecb0aecd1489122a70f52353503c2c
Caching disabled for task ':beam-sdks-java-fn-execution:shadowJar': Caching has 
not been enabled for the task
Task ':beam-sdks-java-fn-execution:shadowJar' is not up-to-date because:
  No history is available.
***
GRADLE SHADOW STATS

Total Jars: 2 (includes project)
Total Time: 2.246s [2246ms]
Average Time/Jar: 1.123s [1123.0ms]
***
:beam-sdks-java-fn-execution:shadowJar (Thread[Task worker for ':' Thread 
10,5,main]) completed. Took 2.34 secs.

> Task :beam-runners-core-construction-java:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-runners-core-construction-java:compileJava'
:beam-runners-core-construction-java:compileJava (Thread[Task worker for ':' 
Thread 9,5,main]) completed. Took 12.005 secs.
:beam-runners-core-construction-java:classes (Thread[Task worker for ':' Thread 
9,5,main]) started.

> Task :beam-runners-core-construction-java:classes
Skipping task ':beam-runners-core-construction-java:classes' as it has no 
actions.
:beam-runners-core-construction-java:classes (Thread[Task worker for ':' Thread 
9,5,main]) completed. Took 0.0 secs.
:beam-runners-core-construction-java:shadowJar (Thread[Task worker for ':' 
Thread 9,5,main]) started.

> Task :beam-runners-core-construction-java:shadowJar
Build cache key for task ':beam-runners-core-construction-java:shadowJar' is 
901217581d245f1551c03616b7e83a8b
Caching disabled for task ':beam-runners-core-construction-java:shadowJar': 
Caching has not been enabled for the task
Task ':beam-runners-core-construction-java:shadowJar' is not up-to-date because:
  No history is available.
***
GRADLE SHADOW STATS

Total Jars: 2 (includes project)
Total Time: 2.096s [2096ms]
Average Time/Jar: 1.048s [1048.0ms]
***
:beam-runners-core-construction-java:shadowJar (Thread[Task worker for ':' 
Thread 9,5,main]) completed. Took 2.401 secs.
:beam-runners-core-java:compileJava (Thread[Task worker for ':' Thread 
9,5,main]) started.

> Task :beam-runners-core-java:compileJava
Build cache key for task ':beam-runners-core-java:compileJava' is 
5e3da9cda06a5dad3e401f7b816c757b
Task ':beam-runners-core-java:compileJava' is not up-to-date because:
  No history is available.
Custom actions are attached to task ':beam-runners-core-java:compileJava'.
All input files are considered out-of-date for incremental task 
':beam-runners-core-java:compileJava'.
Compiling with error-prone compiler
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-runners-core-java:compileJava'
:beam-runners-core-java:compileJava (Thread[Task worker for ':' Thread 
9,5,main]) completed. Took 6.709 secs.
:beam-runners-core-java:classes (Thread[Task worker for ':' Thread 9,5,main]) 
started.

> Task :beam-runners-core-java:classes
Skipping task ':beam-runners-core-java:classes' as it has no actions.
:beam-runners-core-java:classes (Thread[Task worker for ':' Thread 9,5,main]) 
completed. Took 0.0 secs.
:beam-runners-core-java:shadowJar (Thread[Task worker for ':',5,main]) started.

> Task :beam-runners-core-java:shadowJar
Build cache key for task ':beam-runners-core-java:shadowJar' is 
cd4e67fe1e6f8d972cc44e6d70b3865d
Caching disabled for task ':beam-runners-core-java:shadowJar': Caching has not 
been enabled for the task
Task ':beam-runners-core-java:shadowJar' is not up-to-date because:
  No histor

Build failed in Jenkins: beam_PostCommit_Java_Nexmark_Flink #522

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[daniel.o.programmer] [BEAM-4780] Updating to DockerJobBundleFactory in 
ReferenceRunner.

--
[...truncated 124.76 KB...]
  No history is available.
Custom actions are attached to task ':beam-sdks-java-fn-execution:compileJava'.
All input files are considered out-of-date for incremental task 
':beam-sdks-java-fn-execution:compileJava'.
Compiling with error-prone compiler
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-sdks-java-fn-execution:compileJava'
:beam-sdks-java-fn-execution:compileJava (Thread[Task worker for ':' Thread 
2,5,main]) completed. Took 7.094 secs.
:beam-sdks-java-fn-execution:classes (Thread[Task worker for ':' Thread 
2,5,main]) started.

> Task :beam-sdks-java-fn-execution:classes
Skipping task ':beam-sdks-java-fn-execution:classes' as it has no actions.
:beam-sdks-java-fn-execution:classes (Thread[Task worker for ':' Thread 
2,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-fn-execution:shadowJar (Thread[Task worker for ':' Thread 
2,5,main]) started.

> Task :beam-sdks-java-fn-execution:shadowJar
Build cache key for task ':beam-sdks-java-fn-execution:shadowJar' is 
1f95a14368fcce1084518184f1781033
Caching disabled for task ':beam-sdks-java-fn-execution:shadowJar': Caching has 
not been enabled for the task
Task ':beam-sdks-java-fn-execution:shadowJar' is not up-to-date because:
  No history is available.
***
GRADLE SHADOW STATS

Total Jars: 2 (includes project)
Total Time: 2.492s [2492ms]
Average Time/Jar: 1.246s [1246.0ms]
***
:beam-sdks-java-fn-execution:shadowJar (Thread[Task worker for ':' Thread 
2,5,main]) completed. Took 2.58 secs.

> Task :beam-runners-core-construction-java:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-runners-core-construction-java:compileJava'
:beam-runners-core-construction-java:compileJava (Thread[Task worker for ':' 
Thread 3,5,main]) completed. Took 11.861 secs.
:beam-runners-core-construction-java:classes (Thread[Task worker for ':' Thread 
3,5,main]) started.

> Task :beam-runners-core-construction-java:classes
Skipping task ':beam-runners-core-construction-java:classes' as it has no 
actions.
:beam-runners-core-construction-java:classes (Thread[Task worker for ':' Thread 
3,5,main]) completed. Took 0.0 secs.
:beam-runners-core-construction-java:shadowJar (Thread[Task worker for ':' 
Thread 3,5,main]) started.

> Task :beam-runners-core-construction-java:shadowJar
Build cache key for task ':beam-runners-core-construction-java:shadowJar' is 
cbdc51c7e8ededa4c3d98e59458590ef
Caching disabled for task ':beam-runners-core-construction-java:shadowJar': 
Caching has not been enabled for the task
Task ':beam-runners-core-construction-java:shadowJar' is not up-to-date because:
  No history is available.
***
GRADLE SHADOW STATS

Total Jars: 2 (includes project)
Total Time: 2.118s [2118ms]
Average Time/Jar: 1.059s [1059.0ms]
***
:beam-runners-core-construction-java:shadowJar (Thread[Task worker for ':' 
Thread 3,5,main]) completed. Took 2.428 secs.
:beam-runners-core-java:compileJava (Thread[Task worker for ':' Thread 
3,5,main]) started.

> Task :beam-runners-core-java:compileJava
Build cache key for task ':beam-runners-core-java:compileJava' is 
8c7c66d2c586302d9b0598e013be5d96
Task ':beam-runners-core-java:compileJava' is not up-to-date because:
  No history is available.
Custom actions are attached to task ':beam-runners-core-java:compileJava'.
All input files are considered out-of-date for incremental task 
':beam-runners-core-java:compileJava'.
Compiling with error-prone compiler
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Packing task ':beam-runners-core-java:compileJava'
:beam-runners-core-java:compileJava (Thread[Task worker for ':' Thread 
3,5,main]) completed. Took 7.041 secs.
:beam-runners-core-java:classes (Thread[Task worker for ':' Thread 3,5,main]) 
started.

> Task :beam-runners-core-java:classes
Skipping task ':beam-runners-core-java:classes' as it has no actions.
:beam-runners-core-java:classes (Thread[Task worker for ':' Thread 3,5,main]) 
completed. Took 0.0 secs.
:beam-runners-core-java:shadowJar (Thread[Task worker for ':' Thread 3,5,main]) 
started.

> Task :beam-runners-core-java:shadowJar
Build cache key for task ':beam-runners-core-java:shadowJar' is 
6cae615a7c39f1e39fdddbc549112eee
Caching disabled for task ':beam-runners-core-java:shadowJar': Caching has

Build failed in Jenkins: beam_PostCommit_Python_PVR_Flink_Gradle #69

2018-09-19 Thread Apache Jenkins Server

See 


Changes:

[daniel.o.programmer] [BEAM-4780] Updating to DockerJobBundleFactory in 
ReferenceRunner.

--
[...truncated 553.58 KB...]
raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1537383172.196841765","description":"Error received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket
 closed","grpc_status":14}"
>


==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 251, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_combine_per_key_1537383164.68_80eafaf6-dea7-4903-ae6c-b0ae8d2e172f failed 
in state FAILED.

==
ERROR: test_create (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 62, in 
test_create
assert_that(p | beam.Create(['a', 'b']), equal_to(['a', 'b']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_create_1537383165.06_d580bf0b-f413-4749-b507-f58b99e66536 failed in state 
FAILED.

==
ERROR: test_flatten (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 244, in 
test_flatten
assert_that(res, equal_to(['a', 'b', 'c', 'd']))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flatten_1537383165.53_2ff59562-4b13-4716-ba41-ad1e82a04c49 failed in state 
FAILED.

==
ERROR: test_flattened_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 190, in 
test_flattened_side_input
equal_to([(None, {'a': 1, 'b': 2})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_flattened_side_input_1537383166.01_71a08935-026d-4162-8eb7-f50e07c2b51f 
failed in state FAILED.

==
ERROR: test_gbk_side_input (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 198, in 
test_gbk_side_input
equal_to([(None, {'a': [1]})]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' % (self._job_id, self._state))
RuntimeError: Pipeline 
test_gbk_side_input_1537383166.49_9abb594a-ed9b-4664-a5a5-a01c2c99e3c0 failed 
in state FAILED.

==
ERROR: test_group_by_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 237, in 
test_group_by_key
assert_that(res, equal_to([('a', [1, 2]), ('b', [3])]))
  File "apache_beam/pipeline.py", line 414, in __exit__
self.run().wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 209, in 
wait_until_finish
'Pipeline %s failed in state %s.' %

1 2 3 >

1 - 100 of 293 matches

Mail list logo