[jira] [Work logged] (BEAM-5309) Add streaming support for HadoopOutputFormatIO

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5309?focusedWorklogId=171843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171843
 ]

ASF GitHub Bot logged work on BEAM-5309:


Author: ASF GitHub Bot
Created on: 04/Dec/18 07:02
Start Date: 04/Dec/18 07:02
Worklog Time Spent: 10m 
  Work Description: b923 commented on issue #6691: WIP:[BEAM-5309] Add 
streaming support for HadoopFormatIO
URL: https://github.com/apache/beam/pull/6691#issuecomment-443993576
 
 
   @aromanenko-dev  thank you.
   I am OK with that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171843)
Time Spent: 12h 40m  (was: 12.5h)

> Add streaming support for HadoopOutputFormatIO
> --
>
> Key: BEAM-5309
> URL: https://issues.apache.org/jira/browse/BEAM-5309
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: David Hrbacek
>Priority: Minor
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> design doc: https://s.apache.org/beam-streaming-hofio



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171839&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171839
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 04/Dec/18 05:18
Start Date: 04/Dec/18 05:18
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238532219
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -132,7 +134,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 // depending on the master, create the right environment.
 if ("[local]".equals(masterUrl)) {
-  flinkStreamEnv = StreamExecutionEnvironment.createLocalEnvironment();
+  flinkStreamEnv =
+  StreamExecutionEnvironment.createLocalEnvironment(
+  getDefaultLocalParallelism(), flinkConfig);
 
 Review comment:
   @mxm previously, a Flink `Configuration` in scope was thrown away in favor 
of a `new Configuration()`, dropping settings from `--flink-conf-dir` that were 
[supposed to override 
`$FLINK_CONF_DIR`](https://github.com/apache/beam/pull/7031#issuecomment-442112065),
 and generally going against the patterns I know for building up a config 
object like this from sources with different precedence.
   
   So this seemed like a bug-fix worth doing in general, which also blocked my 
metrics configs from propagating as needed here.
   
   @tweise `$FLINK_CONF_DIR` vs `--flink-conf-dir` is an interesting question… 
is one clearly preferred? I thought the latter was more explicit, easier to 
pass to the job-server, and overrides the former, but I didn't fully follow the 
rationale around it in #7031. 
   
   Also if I factored the metrics YAML file out as a test resource [like you 
suggested below](https://github.com/apache/beam/pull/7183/files#r238293952), I 
don't see it being any easier to point at via `$FLINK_CONF_DIR` vs 
`--flink-conf-dir`, but I may be missing things!
   
   lmk what you think, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171839)
Time Spent: 3h 20m  (was: 3h 10m)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6173) Build Dataflow containers for Apache Beam 2.9.0 release

2018-12-03 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-6173:


 Summary: Build Dataflow containers for Apache Beam 2.9.0 release
 Key: BEAM-6173
 URL: https://issues.apache.org/jira/browse/BEAM-6173
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath
 Fix For: 2.9.0


Build the containers when unblocked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-6162.
--
Resolution: Fixed

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171826
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 03:44
Start Date: 04/Dec/18 03:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7190: [BEAM-6162] 
Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190#issuecomment-443959976
 
 
   LGTM. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171826)
Time Spent: 1h 40m  (was: 1.5h)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171828
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 03:44
Start Date: 04/Dec/18 03:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #7190: [BEAM-6162] 
Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index 39701ddaea06..d6c24bd42efa 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -106,6 +106,14 @@ def add_value_provider_argument(self, *args, **kwargs):
 # have add_argument do most of the work
 self.add_argument(*args, **kwargs)
 
+  # The argparse package by default tries to autocomplete option names. This
+  # results in an "ambiguous option" error from argparse when an unknown option
+  # matching multiple known ones are used. This suppresses that behavior.
+  def error(self, message):
+if message.startswith('ambiguous option: '):
+  return
+super(_BeamArgumentParser, self).error(message)
+
 
 class PipelineOptions(HasDisplayData):
   """Pipeline options class used as container for command line options.
diff --git a/sdks/python/apache_beam/options/pipeline_options_test.py 
b/sdks/python/apache_beam/options/pipeline_options_test.py
index 9c14c25668eb..227dcf3049d0 100644
--- a/sdks/python/apache_beam/options/pipeline_options_test.py
+++ b/sdks/python/apache_beam/options/pipeline_options_test.py
@@ -25,6 +25,8 @@
 import hamcrest as hc
 
 from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import ProfilingOptions
+from apache_beam.options.pipeline_options import TypeOptions
 from apache_beam.options.value_provider import RuntimeValueProvider
 from apache_beam.options.value_provider import StaticValueProvider
 from apache_beam.transforms.display import DisplayData
@@ -281,6 +283,21 @@ def _add_argparse_args(cls, parser):
 with self.assertRaises(RuntimeError):
   options.pot_non_vp_arg1.get()
 
+  # The argparse package by default tries to autocomplete option names. This
+  # results in an "ambiguous option" error from argparse when an unknown option
+  # matching multiple known ones are used. This tests that we suppress this
+  # error.
+  def test_unknown_option_prefix(self):
+# Test that the "ambiguous option" error is suppressed.
+options = PipelineOptions(['--profi', 'val1'])
+options.view_as(ProfilingOptions)
+
+# Test that valid errors are not suppressed.
+with self.assertRaises(SystemExit):
+  # Invalid option choice.
+  options = PipelineOptions(['--type_check_strictness', 'blahblah'])
+  options.view_as(TypeOptions)
+
 
 if __name__ == '__main__':
   logging.getLogger().setLevel(logging.INFO)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171828)
Time Spent: 1h 50m  (was: 1h 40m)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171806
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:51
Start Date: 04/Dec/18 02:51
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #7190: [BEAM-6162] 
Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190#issuecomment-443951001
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171806)
Time Spent: 1h 20m  (was: 1h 10m)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171807
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:51
Start Date: 04/Dec/18 02:51
Worklog Time Spent: 10m 
  Work Description: charlesccychen removed a comment on issue #7190: 
[BEAM-6162] Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190#issuecomment-443951001
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171807)
Time Spent: 1.5h  (was: 1h 20m)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6153) PR 7130 causes transforms/util_test to fail

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6153?focusedWorklogId=171803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171803
 ]

ASF GitHub Bot logged work on BEAM-6153:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:47
Start Date: 04/Dec/18 02:47
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #7170: [BEAM-6153] 
Re-enable coder optimization.
URL: https://github.com/apache/beam/pull/7170#issuecomment-443950408
 
 
   Thanks! This LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171803)
Time Spent: 1h 10m  (was: 1h)

> PR 7130 causes transforms/util_test to fail
> ---
>
> Key: BEAM-6153
> URL: https://issues.apache.org/jira/browse/BEAM-6153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> http://github.com/apache/beam/pull/7130 seems to cause transforms/util_test 
> to fail.
> ==
> ERROR: test_reshuffle_sliding_window (__main__.ReshuffleTest)
> --
> BeamAssertException: Failed assert: [(1, [1, 2, 4]), (1, [1, 2, 4]), (2, [1, 
> 2]), (2, [1, 2]), (3, [1]), (3, [1])] == [(1, [1]), (1, [1]), (1, [2]), (1, 
> [2]), (1, [4]), (1, [4]), (2, [1]), (2, [1]), (2, [2]), (2, [2]), (3, [1]), 
> (3, [1])] [while running 'before_reshuffle/Match']



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5167) Use concurrency information from SDK Harness in Flink Portable Runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5167?focusedWorklogId=171802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171802
 ]

ASF GitHub Bot logged work on BEAM-5167:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:40
Start Date: 04/Dec/18 02:40
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7192: [BEAM-5167] Log 
unscheduled process bundle requests
URL: https://github.com/apache/beam/pull/7192#issuecomment-443948890
 
 
   cc: @tweise @mxm 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171802)
Time Spent: 20m  (was: 10m)

> Use concurrency information from SDK Harness in Flink Portable Runner
> -
>
> Key: BEAM-5167
> URL: https://issues.apache.org/jira/browse/BEAM-5167
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Based on the discussion 
> [https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]
> Use SDK Harness concurrency information in Flink runner to schedule bundles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5167) Use concurrency information from SDK Harness in Flink Portable Runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5167?focusedWorklogId=171801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171801
 ]

ASF GitHub Bot logged work on BEAM-5167:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:39
Start Date: 04/Dec/18 02:39
Worklog Time Spent: 10m 
  Work Description: angoenka opened a new pull request #7192: [BEAM-5167] 
Log unscheduled process bundle requests
URL: https://github.com/apache/beam/pull/7192
 
 
   Log a warning message if we are not able to schedule a process bundle 
request for more than a certain amount of time.
   This will be useful in detecting deadlock because of scheduling mismatch 
between runner and sdk. 
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171801)
Time Spent: 10m
 

[jira] [Work logged] (BEAM-5884) Allow nested types have null value.

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5884?focusedWorklogId=171799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171799
 ]

ASF GitHub Bot logged work on BEAM-5884:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:09
Start Date: 04/Dec/18 02:09
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7175: [BEAM-5884] Move 
the nullable attribute onto FieldType.
URL: https://github.com/apache/beam/pull/7175#issuecomment-443943014
 
 
   Ok appears to be an easy fix. I think SQL is improperly comparing
   FieldTypes here.
   
   Since other PRs have been merged that depend on this PR, at this point a
   roll-forward fix is probably easier than a rollback.
   
   On Mon, Dec 3, 2018 at 5:57 PM Reuven Lax  wrote:
   
   > Looking
   >
   > On Mon, Dec 3, 2018 at 5:25 PM Udi Meiri  wrote:
   >
   >> This may have caused precommit to time out:
   >> https://issues.apache.org/jira/browse/BEAM-6171
   >>
   >> —
   >> You are receiving this because you modified the open/close state.
   >> Reply to this email directly, view it on GitHub
   >> , or mute
   >> the thread
   >> 

   >> .
   >>
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171799)
Time Spent: 4h 50m  (was: 4h 40m)

> Allow nested types have null value.
> ---
>
> Key: BEAM-5884
> URL: https://issues.apache.org/jira/browse/BEAM-5884
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We could allow arbitrary combination of nested types have null value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171798
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:08
Start Date: 04/Dec/18 02:08
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6930: [BEAM-5462] get rid of 
.options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443942813
 
 
   Run Python Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171798)
Time Spent: 6h 50m  (was: 6h 40m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6171) beam_PostCommit_Java_GradleBuild timing out (4 hours)

2018-12-03 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708064#comment-16708064
 ] 

Udi Meiri commented on BEAM-6171:
-

[~charleschen]

> beam_PostCommit_Java_GradleBuild timing out (4 hours)
> -
>
> Key: BEAM-6171
> URL: https://issues.apache.org/jira/browse/BEAM-6171
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, testing
>Reporter: Udi Meiri
>Assignee: Reuven Lax
>Priority: Major
>
> I believe that this PR is the culprit: 
> https://github.com/apache/beam/pull/7175
> Example log: 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/2041/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171797
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:07
Start Date: 04/Dec/18 02:07
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6930: [BEAM-5462] get rid of 
.options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443942784
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171797)
Time Spent: 6h 40m  (was: 6.5h)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171796
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 04/Dec/18 02:07
Start Date: 04/Dec/18 02:07
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6930: [BEAM-5462] get rid of 
.options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443942704
 
 
   Run Python Dataflow ValidatesRunner
   Run Python Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171796)
Time Spent: 6.5h  (was: 6h 20m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6172) Flink metrics are not generated in standard format

2018-12-03 Thread Micah Wylde (JIRA)
Micah Wylde created BEAM-6172:
-

 Summary: Flink metrics are not generated in standard format
 Key: BEAM-6172
 URL: https://issues.apache.org/jira/browse/BEAM-6172
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Affects Versions: 2.8.0
Reporter: Micah Wylde


The metrics that the flink runner exports do not follow the standard format 
used by Flink, and doesn't respect Flink metric configuration options. 

For example (with the default metrics configuration) beam produces a metric:

{code}
10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.group.0.__counter__group__org-apache-beam-runners-core-ReduceFnRunner__droppedDueToClosedWindow
{code}

whereas a native Flink metric looks like:

{code}
10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.Source-Custom-Source-7Kinesis-None-beam-env-docker-v1-0-ToKeyedWorkItem.0.numRecordsOut
{code}

In particular, Beam should respect the 
[metric.scope.delimiter|https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#metrics-scope-delimiter]
 configuration for separating components of a metric (currently it uses "__"), 
and should not include the type of metric (counter, gauge, etc.).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5884) Allow nested types have null value.

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5884?focusedWorklogId=171794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171794
 ]

ASF GitHub Bot logged work on BEAM-5884:


Author: ASF GitHub Bot
Created on: 04/Dec/18 01:57
Start Date: 04/Dec/18 01:57
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7175: [BEAM-5884] Move 
the nullable attribute onto FieldType.
URL: https://github.com/apache/beam/pull/7175#issuecomment-443940660
 
 
   Looking
   
   On Mon, Dec 3, 2018 at 5:25 PM Udi Meiri  wrote:
   
   > This may have caused precommit to time out:
   > https://issues.apache.org/jira/browse/BEAM-6171
   >
   > —
   > You are receiving this because you modified the open/close state.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171794)
Time Spent: 4h 40m  (was: 4.5h)

> Allow nested types have null value.
> ---
>
> Key: BEAM-5884
> URL: https://issues.apache.org/jira/browse/BEAM-5884
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We could allow arbitrary combination of nested types have null value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171790
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 01:42
Start Date: 04/Dec/18 01:42
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7190: [BEAM-6162] 
Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190#issuecomment-443937704
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171790)
Time Spent: 1h 10m  (was: 1h)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6171) beam_PostCommit_Java_GradleBuild timing out (4 hours)

2018-12-03 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708060#comment-16708060
 ] 

Udi Meiri commented on BEAM-6171:
-

Failing tests:
12:05:11 org.apache.beam.sdk.extensions.sql.PubsubToBigqueryIT > 
testSimpleInsert FAILED
12:05:13 org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT > 
testSelectsPayloadContent FAILED

Task beam-sdks-java-extensions-sql:integrationTest seems to hang.

> beam_PostCommit_Java_GradleBuild timing out (4 hours)
> -
>
> Key: BEAM-6171
> URL: https://issues.apache.org/jira/browse/BEAM-6171
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, testing
>Reporter: Udi Meiri
>Assignee: Reuven Lax
>Priority: Major
>
> I believe that this PR is the culprit: 
> https://github.com/apache/beam/pull/7175
> Example log: 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/2041/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6171) beam_PostCommit_Java_GradleBuild timing out (4 hours)

2018-12-03 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-6171:

Issue Type: Bug  (was: Task)

> beam_PostCommit_Java_GradleBuild timing out (4 hours)
> -
>
> Key: BEAM-6171
> URL: https://issues.apache.org/jira/browse/BEAM-6171
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, testing
>Reporter: Udi Meiri
>Assignee: Reuven Lax
>Priority: Major
>
> I believe that this PR is the culprit: 
> https://github.com/apache/beam/pull/7175
> Example log: 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/2041/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171783
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 04/Dec/18 01:19
Start Date: 04/Dec/18 01:19
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #7189: [BEAM-5514] BigQueryIO 
doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#issuecomment-443933065
 
 
   R: @reuvenlax, @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171783)
Time Spent: 1h 10m  (was: 1h)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6171) beam_PostCommit_Java_GradleBuild timing out (4 hours)

2018-12-03 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-6171:
---

 Summary: beam_PostCommit_Java_GradleBuild timing out (4 hours)
 Key: BEAM-6171
 URL: https://issues.apache.org/jira/browse/BEAM-6171
 Project: Beam
  Issue Type: Task
  Components: sdk-java-core, testing
Reporter: Udi Meiri
Assignee: Reuven Lax


I believe that this PR is the culprit: https://github.com/apache/beam/pull/7175
Example log: 
https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/2041/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5884) Allow nested types have null value.

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5884?focusedWorklogId=171786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171786
 ]

ASF GitHub Bot logged work on BEAM-5884:


Author: ASF GitHub Bot
Created on: 04/Dec/18 01:25
Start Date: 04/Dec/18 01:25
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7175: [BEAM-5884] Move the 
nullable attribute onto FieldType.
URL: https://github.com/apache/beam/pull/7175#issuecomment-443934402
 
 
   This may have caused precommit to time out: 
https://issues.apache.org/jira/browse/BEAM-6171


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171786)
Time Spent: 4.5h  (was: 4h 20m)

> Allow nested types have null value.
> ---
>
> Key: BEAM-5884
> URL: https://issues.apache.org/jira/browse/BEAM-5884
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We could allow arbitrary combination of nested types have null value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=171777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171777
 ]

ASF GitHub Bot logged work on BEAM-5058:


Author: ASF GitHub Bot
Created on: 04/Dec/18 01:02
Start Date: 04/Dec/18 01:02
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #7163: [BEAM-5058] Run basic 
ITs in Python Precommit in parallel
URL: https://github.com/apache/beam/pull/7163
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/build.gradle b/build.gradle
index a74817f70e8d..fc11ecc390a2 100644
--- a/build.gradle
+++ b/build.gradle
@@ -240,6 +240,7 @@ task goIntegrationTests() {
 
 task pythonPreCommit() {
   dependsOn ":beam-sdks-python:preCommit"
+  dependsOn ":beam-sdks-python-precommit-dataflow:precommitIT"
 }
 
 task pythonPostCommit() {
diff --git 
a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy 
b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index 63c6729c2d61..329a6edbb3ec 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -1488,5 +1488,81 @@ artifactId=${project.name}
 dependsOn ':beam-sdks-java-container:docker'
   }
 }
+
+/** 
***/
+
+project.ext.applyPythonNature = {
+
+  // Define common lifecycle tasks and artifact types
+  project.apply plugin: "base"
+
+  // For some reason base doesn't define a test task  so we define it 
below and make
+  // check depend on it. This makes the Python project similar to the task 
layout like
+  // Java projects, see 
https://docs.gradle.org/4.2.1/userguide/img/javaPluginTasks.png
+  project.task('test', type: Test) {}
+  project.check.dependsOn project.test
+
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-fn-api-worker")
+
+  // Due to Beam-4256, we need to limit the length of virtualenv path to 
make the
+  // virtualenv activated properly. So instead of include project name in 
the path,
+  // we use the hash value.
+  project.ext.envdir = 
"${project.rootProject.buildDir}/gradleenv/${project.name.hashCode()}"
+  project.ext.pythonRootDir = "${project.rootDir}/sdks/python"
+
+  project.task('setupVirtualenv')  {
+doLast {
+  project.exec { commandLine 'virtualenv', "${project.ext.envdir}" }
+  project.exec {
+executable 'sh'
+args '-c', ". ${project.ext.envdir}/bin/activate && pip install 
--upgrade tox==3.0.0 grpcio-tools==1.3.5"
+  }
+}
+// Gradle will delete outputs whenever it thinks they are stale. 
Putting a
+// specific binary here could make gradle delete it while pip will 
believe
+// the package is fully installed.
+outputs.dirs(project.ext.envdir)
+  }
+
+  project.configurations { distConfig }
+
+  project.task('sdist', dependsOn: 'setupVirtualenv') {
+doLast {
+  project.exec {
+executable 'sh'
+args '-c', ". ${project.ext.envdir}/bin/activate && python 
${project.ext.pythonRootDir}/setup.py sdist --formats zip,gztar --dist-dir 
${project.buildDir}"
+  }
+  def collection = project.fileTree("${project.buildDir}"){ include 
'**/*.tar.gz' exclude '**/apache-beam.tar.gz'}
+  println "sdist archive name: ${collection.singleFile}"
+  // we need a fixed name for the artifact
+  project.copy { from collection.singleFile; into 
"${project.buildDir}"; rename { 'apache-beam.tar.gz' } }
+}
+  }
+
+  project.artifacts {
+distConfig file: 
project.file("${project.buildDir}/apache-beam.tar.gz"), builtBy: project.sdist
+  }
+
+  project.task('installGcpTest', dependsOn: 'setupVirtualenv') {
+doLast {
+  project.exec {
+executable 'sh'
+args '-c', ". ${project.ext.envdir}/bin/activate && pip install -e 
${project.ext.pythonRootDir}/[gcp,test]"
+  }
+}
+  }
+  project.installGcpTest.mustRunAfter project.sdist
+
+  project.task('cleanPython', dependsOn: 'setupVirtualenv') {
+doLast {
+  project.exec {
+executable 'sh'
+args '-c', ". ${project.ext.envdir}/bin/activate && python 
${project.ext.pythonRootDir}/setup.py clean"
+  }
+  project.delete project.buildDir
+}
+  }
+  project.clean.dependsOn project.cleanPython
+}
   }
 }
di

[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171775
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:58
Start Date: 04/Dec/18 00:58
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r238496363
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java
 ##
 @@ -150,4 +156,47 @@ void createDataset(
 Table patchTableDescription(TableReference tableReference, @Nullable 
String tableDescription)
 throws IOException, InterruptedException;
   }
+
+  /** A class for controlling insertAll submission rate. */
 
 Review comment:
   added some comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171775)
Time Spent: 1h  (was: 50m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6170) NexmarkLauncher stall warning causes benchmark failure

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6170?focusedWorklogId=171774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171774
 ]

ASF GitHub Bot logged work on BEAM-6170:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:45
Start Date: 04/Dec/18 00:45
Worklog Time Spent: 10m 
  Work Description: scwhittle opened a new pull request #7191: [BEAM-6170] 
Change Nexmark stuckness warnings to not fail pipeline
URL: https://github.com/apache/beam/pull/7191
 
 
   The monitoring in the NexmarkLauncher watches the pipeline results and 
source for events.  If no events are consumed from the source or published to 
the output for over
   STUCK_WARNING_DELAY it prints an warning message:
   
"WARNING: streaming query appears to have been stuck for %d min.",
   
   However it also adds this to the errors list, which causes the pipeline to 
terminate with an exception.  There is a separate STUCK_TERMINATE_DELAY which 
is used to terminate the pipeline.  It seems inconsistent to cause the pipeline 
to fail with the warning timeout.  It is also not entirely accurate to say the 
pipeline is stuck since it may be busy processing stages other than the input 
or output.
   
   R: @echauchot 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171774)
Time Spent: 10m
Remaining Estimate: 0h  (was: 10m)

> NexmarkLauncher stall warning causes benchmark failure
> --
>
> Key: BEAM-6170
> URL: https://issues.apache.org/jira/browse/BEAM-6170
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The monitoring in the NexmarkLauncher watches the pipeline results and source 
> for events.  If no events are consumed from the source or published to the 
> output for over
> STUCK_WARNING_DELAY it prints an warning message:
>  "WARNING: streaming query appears to have been stuck for %d min.",
> However it also adds this to the errors list, which causes the pipeline to 
> terminate by throwing an exception.  There is a separate 
> STUCK_TERMINATE_DELAY which is used to terminate the pipeline.  It seems 
> inconsistent to cause the pipeline to fail with the warning timeout.  It is 
> also not 100% accurrate because the pipeline may be busy processing stages 
> other than the input or output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171773
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:39
Start Date: 04/Dec/18 00:39
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #7190: [BEAM-6162] 
Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190#issuecomment-443925304
 
 
   R: @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171773)
Time Spent: 1h  (was: 50m)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6162) Python PipelineOptions raises "ambiguous option" error due to argparse behavior

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6162?focusedWorklogId=171772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171772
 ]

ASF GitHub Bot logged work on BEAM-6162:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:39
Start Date: 04/Dec/18 00:39
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #7190: 
[BEAM-6162] Cherry-pick PR #7176 into release-2.9.0
URL: https://github.com/apache/beam/pull/7190
 
 
   Fix PipelineOptions argparse behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171772)
Time Spent: 50m  (was: 40m)

> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior
> ---
>
> Key: BEAM-6162
> URL: https://issues.apache.org/jira/browse/BEAM-6162
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Blocker
> Fix For: 2.9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I am running an Jupyter notebook with an --profile option.  This works in 
> 2.8.0 but is broken after [https://github.com/apache/beam/pull/6847] as the 
> Python PipelineOptions raises "ambiguous option" error due to argparse 
> behavior of trying to autocomplete options by prefix (this behavior is due to 
> [https://bugs.python.org/issue29777] which the Python dev team does not want 
> to fix).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171767
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:59
Start Date: 03/Dec/18 23:59
Worklog Time Spent: 10m 
  Work Description: ihji closed pull request #7132: [BEAM-5462] get rid of 
.options deprecation warnings in tests (alt impl)
URL: https://github.com/apache/beam/pull/7132
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/gcp/pubsub_test.py 
b/sdks/python/apache_beam/io/gcp/pubsub_test.py
index a95ffc6e5423..04f8802287c4 100644
--- a/sdks/python/apache_beam/io/gcp/pubsub_test.py
+++ b/sdks/python/apache_beam/io/gcp/pubsub_test.py
@@ -35,6 +35,7 @@
 from apache_beam.io.gcp.pubsub import WriteToPubSub
 from apache_beam.io.gcp.pubsub import _PubSubSink
 from apache_beam.io.gcp.pubsub import _PubSubSource
+from apache_beam.options.pipeline_options import PipelineOptions
 from apache_beam.options.pipeline_options import StandardOptions
 from apache_beam.runners.direct import transform_evaluator
 from apache_beam.runners.direct.direct_runner import _DirectReadFromPubSub
@@ -109,8 +110,9 @@ def test_repr(self):
 class TestReadFromPubSubOverride(unittest.TestCase):
 
   def test_expand_with_topic(self):
-p = TestPipeline()
-p.options.view_as(StandardOptions).streaming = True
+options = PipelineOptions([])
+options.view_as(StandardOptions).streaming = True
+p = TestPipeline(options=options)
 pcoll = (p
  | ReadFromPubSub('projects/fakeprj/topics/a_topic',
   None, 'a_label', with_attributes=False,
@@ -119,7 +121,7 @@ def test_expand_with_topic(self):
 self.assertEqual(bytes, pcoll.element_type)
 
 # Apply the necessary PTransformOverrides.
-overrides = _get_transform_overrides(p.options)
+overrides = _get_transform_overrides(options)
 p.replace_all(overrides)
 
 # Note that the direct output of ReadFromPubSub will be replaced
@@ -132,8 +134,9 @@ def test_expand_with_topic(self):
 self.assertEqual('a_label', source.id_label)
 
   def test_expand_with_subscription(self):
-p = TestPipeline()
-p.options.view_as(StandardOptions).streaming = True
+options = PipelineOptions([])
+options.view_as(StandardOptions).streaming = True
+p = TestPipeline(options=options)
 pcoll = (p
  | ReadFromPubSub(
  None, 'projects/fakeprj/subscriptions/a_subscription',
@@ -142,7 +145,7 @@ def test_expand_with_subscription(self):
 self.assertEqual(bytes, pcoll.element_type)
 
 # Apply the necessary PTransformOverrides.
-overrides = _get_transform_overrides(p.options)
+overrides = _get_transform_overrides(options)
 p.replace_all(overrides)
 
 # Note that the direct output of ReadFromPubSub will be replaced
@@ -167,8 +170,9 @@ def test_expand_with_both_topic_and_subscription(self):
  with_attributes=False, timestamp_attribute=None)
 
   def test_expand_with_other_options(self):
-p = TestPipeline()
-p.options.view_as(StandardOptions).streaming = True
+options = PipelineOptions([])
+options.view_as(StandardOptions).streaming = True
+p = TestPipeline(options=options)
 pcoll = (p
  | ReadFromPubSub('projects/fakeprj/topics/a_topic',
   None, 'a_label', with_attributes=True,
@@ -177,7 +181,7 @@ def test_expand_with_other_options(self):
 self.assertEqual(PubsubMessage, pcoll.element_type)
 
 # Apply the necessary PTransformOverrides.
-overrides = _get_transform_overrides(p.options)
+overrides = _get_transform_overrides(options)
 p.replace_all(overrides)
 
 # Note that the direct output of ReadFromPubSub will be replaced
@@ -193,15 +197,16 @@ def test_expand_with_other_options(self):
 @unittest.skipIf(pubsub is None, 'GCP dependencies are not installed')
 class TestWriteStringsToPubSubOverride(unittest.TestCase):
   def test_expand_deprecated(self):
-p = TestPipeline()
-p.options.view_as(StandardOptions).streaming = True
+options = PipelineOptions([])
+options.view_as(StandardOptions).streaming = True
+p = TestPipeline(options=options)
 pcoll = (p
  | ReadFromPubSub('projects/fakeprj/topics/baz')
  | WriteStringsToPubSub('projects/fakeprj/topics/a_topic')
  | beam.Map(lambda x: x))
 
 # Apply the necessary PTransformOverrides.
-overrides = _get_transform_overrides(p.options)
+overrides = _get_transform_overrides(options)
 p.replace_all(overrides)

[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171769
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:11
Start Date: 04/Dec/18 00:11
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #7189: 
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r238487327
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java
 ##
 @@ -150,4 +156,47 @@ void createDataset(
 Table patchTableDescription(TableReference tableReference, @Nullable 
String tableDescription)
 throws IOException, InterruptedException;
   }
+
+  /** A class for controlling insertAll submission rate. */
 
 Review comment:
   JavaDoc describing bit more about what it is controlling?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171769)
Time Spent: 50m  (was: 40m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171768&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171768
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 04/Dec/18 00:05
Start Date: 04/Dec/18 00:05
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #6930: [BEAM-5462] get rid of 
.options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443918138
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171768)
Time Spent: 6h 20m  (was: 6h 10m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171766&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171766
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:59
Start Date: 03/Dec/18 23:59
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #7132: [BEAM-5462] get rid of 
.options deprecation warnings in tests (alt impl)
URL: https://github.com/apache/beam/pull/7132#issuecomment-443916914
 
 
   Move this change to #6930 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171766)
Time Spent: 6h  (was: 5h 50m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171765
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:58
Start Date: 03/Dec/18 23:58
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #6930: [BEAM-5462] get rid of 
.options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443916677
 
 
   @chamikaramj I'll close the new one and update this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171765)
Time Spent: 5h 50m  (was: 5h 40m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6033) normalize httplib2.Http initialization and usage

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6033?focusedWorklogId=171764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171764
 ]

ASF GitHub Bot logged work on BEAM-6033:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:56
Start Date: 03/Dec/18 23:56
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #7032: [BEAM-6033] normalize 
httplib2.Http initialization and usage
URL: https://github.com/apache/beam/pull/7032#issuecomment-443916251
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171764)
Time Spent: 2.5h  (was: 2h 20m)

> normalize httplib2.Http initialization and usage
> 
>
> Key: BEAM-6033
> URL: https://issues.apache.org/jira/browse/BEAM-6033
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Ideally solve both issues below in one PR, but issue 1 has priority as it can 
> halt a pipeline.
> Issue 1:
> Datastore client (and other httplib2-based clients for GCS, Dataflow, 
> BigQuery, etc.) doesn't set a socket timeout.
> This can cause _flush_batch() in datastoreio.py to block forever waiting for 
> a response.
> This issue is very similar to https://issues.apache.org/jira/browse/BEAM-5915 
> and the solution should be similar.
> Issue 2:
> Standardize use of proxy environment settings, as in gcsio:
> https://github.com/apache/beam/blob/8d3389df78aa2e0a0de06b7c5743ca3530dec4ac/sdks/python/apache_beam/io/gcp/gcsio.py#L136
> Issue for proxy settings: https://issues.apache.org/jira/browse/BEAM-3184



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171763
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:52
Start Date: 03/Dec/18 23:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7189: [BEAM-5514] 
BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#issuecomment-443915508
 
 
   Is this ready for review ?
   
   If so, R: @rangadi can you take a look ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171763)
Time Spent: 40m  (was: 0.5h)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171762
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:51
Start Date: 03/Dec/18 23:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7132: [BEAM-5462] get 
rid of .options deprecation warnings in tests (alt impl)
URL: https://github.com/apache/beam/pull/7132#issuecomment-443915242
 
 
   Please add a reviewer (with R: tag or by git reviewers feature) when this is 
ready for review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171762)
Time Spent: 5h 40m  (was: 5.5h)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5462) get rid of .options deprecation warnings in tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5462?focusedWorklogId=171759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171759
 ]

ASF GitHub Bot logged work on BEAM-5462:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:50
Start Date: 03/Dec/18 23:50
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #6930: [BEAM-5462] get 
rid of .options deprecation warnings in tests
URL: https://github.com/apache/beam/pull/6930#issuecomment-443914958
 
 
   Can you update the existing PR instead of opening a new one so that review 
can be continued at the same location ?
   
   If above is not possible for some reason please close this PR instead of the 
new one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171759)
Time Spent: 5.5h  (was: 5h 20m)

> get rid of .options deprecation warnings in tests
> ---
>
> Key: BEAM-5462
> URL: https://issues.apache.org/jira/browse/BEAM-5462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Messages look like:
> {{/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py:360:
>  DeprecationWarning: options is deprecated since First stable release. 
> References to .options will not be supported}}
> {{pipeline.replace_all(_get_transform_overrides(pipeline.options))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6033) normalize httplib2.Http initialization and usage

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6033?focusedWorklogId=171760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171760
 ]

ASF GitHub Bot logged work on BEAM-6033:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:50
Start Date: 03/Dec/18 23:50
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7032: [BEAM-6033] 
normalize httplib2.Http initialization and usage
URL: https://github.com/apache/beam/pull/7032#issuecomment-443915044
 
 
   Please resolve the conflict.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171760)
Time Spent: 2h 20m  (was: 2h 10m)

> normalize httplib2.Http initialization and usage
> 
>
> Key: BEAM-6033
> URL: https://issues.apache.org/jira/browse/BEAM-6033
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Ideally solve both issues below in one PR, but issue 1 has priority as it can 
> halt a pipeline.
> Issue 1:
> Datastore client (and other httplib2-based clients for GCS, Dataflow, 
> BigQuery, etc.) doesn't set a socket timeout.
> This can cause _flush_batch() in datastoreio.py to block forever waiting for 
> a response.
> This issue is very similar to https://issues.apache.org/jira/browse/BEAM-5915 
> and the solution should be similar.
> Issue 2:
> Standardize use of proxy environment settings, as in gcsio:
> https://github.com/apache/beam/blob/8d3389df78aa2e0a0de06b7c5743ca3530dec4ac/sdks/python/apache_beam/io/gcp/gcsio.py#L136
> Issue for proxy settings: https://issues.apache.org/jira/browse/BEAM-3184



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=171761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171761
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:50
Start Date: 03/Dec/18 23:50
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #6763: [BEAM-] 
Parquet IO for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-443915117
 
 
   Udi, can you do the next round of reviews ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171761)
Time Spent: 11h 40m  (was: 11.5h)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171756
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:43
Start Date: 03/Dec/18 23:43
Worklog Time Spent: 10m 
  Work Description: ihji removed a comment on issue #7189: [BEAM-5514] 
BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#issuecomment-443913755
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171756)
Time Spent: 0.5h  (was: 20m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171755
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:43
Start Date: 03/Dec/18 23:43
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #7189: [BEAM-5514] BigQueryIO 
doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#issuecomment-443913755
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171755)
Time Spent: 20m  (was: 10m)

> BigQueryIO doesn't handle quotaExceeded errors properly
> ---
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Peterson
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery 
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate 
> limited exception, and therefore does not perform exponential backoff 
> properly, leading to repeated calls to BQ.
> The actual error is in the 
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
>  class, which is called from 
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
>  to determine how to retry the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=171750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171750
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:35
Start Date: 03/Dec/18 23:35
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6763: [BEAM-] Parquet IO 
for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-443911916
 
 
   @chamikaramj @udim what are the next steps on this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171750)
Time Spent: 11.5h  (was: 11h 20m)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5978) portableWordCount gradle task not working

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5978?focusedWorklogId=171749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171749
 ]

ASF GitHub Bot logged work on BEAM-5978:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:28
Start Date: 03/Dec/18 23:28
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #7174: [BEAM-5978] Changing 
parallelism for wordcount to 1
URL: https://github.com/apache/beam/pull/7174#issuecomment-443910432
 
 
   @angoenka the error posted above applies to Docker execution (the default), 
as shown in the command. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171749)
Time Spent: 6h  (was: 5h 50m)

> portableWordCount gradle task not working
> -
>
> Key: BEAM-5978
> URL: https://issues.apache.org/jira/browse/BEAM-5978
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> A few issues with portableWordCount gradle task when running with jobserver 
> docker image
>  # Jobserver docker image docker mount fails on linux.
>  # Docker can not write read and write to local file system.
>  # Docker from jobserver docker container requires libltdl7 lib on linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5978) portableWordCount gradle task not working

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5978?focusedWorklogId=171748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171748
 ]

ASF GitHub Bot logged work on BEAM-5978:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:27
Start Date: 03/Dec/18 23:27
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #7174: 
[BEAM-5978] Changing parallelism for wordcount to 1
URL: https://github.com/apache/beam/pull/7174#discussion_r238478822
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -283,13 +283,14 @@ def portableWordCountTask(name, streaming, envdir) {
   "--input=/etc/profile",
   "--output=/tmp/py-wordcount-direct",
   "--runner=PortableRunner",
-  "--experiments=worker_threads=100",
   ]
   if (streaming)
 options += ["--streaming"]
   else
   // workaround for local file output in docker container
 options += ["--environment_cache_millis=1"]
+  // [BEAM-5167] Workaround for scheduling issue between SDKHarness and 
Flink
+options += ["--parallelism=1"]
 
 Review comment:
   if the intention is to use it only for batch, you would need to add {..}


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171748)
Time Spent: 5h 50m  (was: 5h 40m)

> portableWordCount gradle task not working
> -
>
> Key: BEAM-5978
> URL: https://issues.apache.org/jira/browse/BEAM-5978
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> A few issues with portableWordCount gradle task when running with jobserver 
> docker image
>  # Jobserver docker image docker mount fails on linux.
>  # Docker can not write read and write to local file system.
>  # Docker from jobserver docker container requires libltdl7 lib on linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5378) Ensure all Go SDK examples run successfully

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5378?focusedWorklogId=171747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171747
 ]

ASF GitHub Bot logged work on BEAM-5378:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:20
Start Date: 03/Dec/18 23:20
Worklog Time Spent: 10m 
  Work Description: stale[bot] closed pull request #6386: [BEAM-5378] - 
Update minimal_wordcount.go to reflect documentation
URL: https://github.com/apache/beam/pull/6386
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/examples/minimal_wordcount/minimal_wordcount.go 
b/sdks/go/examples/minimal_wordcount/minimal_wordcount.go
index 628429368b00..419474c01204 100644
--- a/sdks/go/examples/minimal_wordcount/minimal_wordcount.go
+++ b/sdks/go/examples/minimal_wordcount/minimal_wordcount.go
@@ -38,25 +38,39 @@ package main
 
 import (
"context"
+   "flag"
"fmt"
"regexp"
 
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
-   "github.com/apache/beam/sdks/go/pkg/beam/runners/direct"
+   "github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+   "github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
 
_ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs"
-   _ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local"
 )
 
 var wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
 
+var (
+   output = flag.String("output", "", "gs:// path to write output")
+)
+
 func main() {
+   flag.Parse()
+   beam.Init()
+
// Create the Pipeline object and root scope.
p := beam.NewPipeline()
s := p.Root()
 
+   ctx := context.Background()
+
+   if *output == "" {
+   log.Exit(ctx, "No output storage path specified. Use 
--output=gs://gs-storage-name/minimal_wordcount.txt")
+   }
+
// Apply the pipeline's transforms.
 
// Concept #1: Invoke a root transform with the pipeline; in this case,
@@ -95,8 +109,10 @@ func main() {
// Concept #4: Invoke textio.Write at the end of the pipeline to write
// the contents of a PCollection (in this case, our PCollection of
// formatted strings) to a text file.
-   textio.Write(s, "wordcounts.txt", formatted)
+   textio.Write(s, *output, formatted)
 
-   // Run the pipeline on the direct runner.
-   direct.Execute(context.Background(), p)
+   // Run the pipeline
+   if err := beamx.Run(ctx, p); err != nil {
+   log.Exitf(ctx, "Failed to execute job: %v", err)
+   }
 }
diff --git a/website/src/get-started/wordcount-example.md 
b/website/src/get-started/wordcount-example.md
index ab85689fd3b7..74c004164633 100644
--- a/website/src/get-started/wordcount-example.md
+++ b/website/src/get-started/wordcount-example.md
@@ -90,7 +90,7 @@ To view the full code in Python, see
 
 {:.language-go}
 To view the full code in Go, see
-**[wordcount_minimal.go](https://github.com/apache/beam/blob/master/sdks/go/examples/minimal_wordcount/minimal_wordcount.go).**
+**[minimal_wordcount.go](https://github.com/apache/beam/blob/master/sdks/go/examples/minimal_wordcount/minimal_wordcount.go).**
 
 **Key Concepts:**
 
@@ -284,7 +284,7 @@ The MinimalWordCount pipeline contains five transforms:
 %}```
 
 ```go
-textio.Write(s, "wordcounts.txt", formatted)
+textio.Write(s, "gs://my-bucket/counts.txt", formatted)
 ```
 
 {:.language-java}
@@ -314,7 +314,9 @@ p.run().waitUntilFinish();
 %}```
 
 ```go
-direct.Execute(context.Background(), p)
+if err := beamx.Run(ctx, p); err != nil {
+  log.Exitf(ctx, "Failed to execute job: %v", err)
+}
 ```
 
 {:.language-java}


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171747)
Time Spent: 4h 10m  (was: 4h)

> Ensure all Go SDK examples run successfully
> ---
>
> Key: BEAM-5378
> URL: https://issues.apache.org/jira/browse/BEAM-5378
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> I've been spending a day or so r

[jira] [Work logged] (BEAM-5378) Ensure all Go SDK examples run successfully

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5378?focusedWorklogId=171746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171746
 ]

ASF GitHub Bot logged work on BEAM-5378:


Author: ASF GitHub Bot
Created on: 03/Dec/18 23:20
Start Date: 03/Dec/18 23:20
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6386: [BEAM-5378] - 
Update minimal_wordcount.go to reflect documentation
URL: https://github.com/apache/beam/pull/6386#issuecomment-443908425
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171746)
Time Spent: 4h  (was: 3h 50m)

> Ensure all Go SDK examples run successfully
> ---
>
> Key: BEAM-5378
> URL: https://issues.apache.org/jira/browse/BEAM-5378
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I've been spending a day or so running through the example available for the 
> Go SDK in order to see what works and on what runner (direct, dataflow), and 
> what doesn't and here's the results.
> All available examples for the go sdk. For me as a new developer on apache 
> beam and dataflow it would be a tremendous value to have all examples running 
> because many of them have legitimate use-cases behind them. 
> {code:java}
> ├── complete
> │   └── autocomplete
> │   └── autocomplete.go
> ├── contains
> │   └── contains.go
> ├── cookbook
> │   ├── combine
> │   │   └── combine.go
> │   ├── filter
> │   │   └── filter.go
> │   ├── join
> │   │   └── join.go
> │   ├── max
> │   │   └── max.go
> │   └── tornadoes
> │   └── tornadoes.go
> ├── debugging_wordcount
> │   └── debugging_wordcount.go
> ├── forest
> │   └── forest.go
> ├── grades
> │   └── grades.go
> ├── minimal_wordcount
> │   └── minimal_wordcount.go
> ├── multiout
> │   └── multiout.go
> ├── pingpong
> │   └── pingpong.go
> ├── streaming_wordcap
> │   └── wordcap.go
> ├── windowed_wordcount
> │   └── windowed_wordcount.go
> ├── wordcap
> │   └── wordcap.go
> ├── wordcount
> │   └── wordcount.go
> └── yatzy
> └── yatzy.go
> {code}
> All examples that are supposed to be runnable by the direct driver (not 
> depending on gcp platform services) are runnable.
> On the otherhand these are the tests that needs to be updated because its not 
> runnable on the dataflow platform for various reasons.
> I tried to figure them out and all I can do is to pin point at least where it 
> fails since my knowledge so far in the beam / dataflow internals is limited.
> .
> ├── complete
> │   └── autocomplete
> │   └── autocomplete.go
> Runs successfully if swapping the input to one of the shakespear data files 
> from gs://
> But when running this it yields a error from the top.Largest func (discussed 
> in another issue that top.Largest needs to have a serializeable combinator / 
> accumulator)
> ➜  autocomplete git:(master) ✗ ./autocomplete --project fair-app-213019 
> --runner dataflow --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
>  
> 2018/09/11 15:35:26 Running autocomplete
> Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/11 15:35:26 Using running binary as worker binary: './autocomplete'
> 2018/09/11 15:35:26 Staging worker binary: ./autocomplete
> ├── contains
> │   └── contains.go
> Fails when running debug.Head for some mysterious reason, might have to do 
> with the param passing into the x,y iterator. Frankly I dont know and could 
> not figure.
> But removing the debug.Head call everything works as expected and succeeds.
> ├── cookbook
> │   ├── combine
> │   │   └── combine.go
> https://github.com/apache/beam/pull/6474
> │   ├── filter
> │   │   └── filter.go
> Fails go-job-1-1536673624017210012
> 2018-09-11 (15:47:13) Output i0 for step was not found. 
> │   ├── join
> │   │   └── join.go
> Working as expected! Whey!
> │   ├── max
> │   │   └── max.go
> Working!
> │   └── tornadoes
> │   └── tornadoes.go
> Working!
> ├── debugging_wordcount
> │   └── debugging_wordcount.go
> Works fine!
> ├── fo

[jira] [Created] (BEAM-6170) NexmarkLauncher stall warning causes benchmark failure

2018-12-03 Thread Sam Whittle (JIRA)
Sam Whittle created BEAM-6170:
-

 Summary: NexmarkLauncher stall warning causes benchmark failure
 Key: BEAM-6170
 URL: https://issues.apache.org/jira/browse/BEAM-6170
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Sam Whittle
Assignee: Sam Whittle


The monitoring in the NexmarkLauncher watches the pipeline results and source 
for events.  If no events are consumed from the source or published to the 
output for over
STUCK_WARNING_DELAY it prints an warning message:

 "WARNING: streaming query appears to have been stuck for %d min.",

However it also adds this to the errors list, which causes the pipeline to 
terminate by throwing an exception.  There is a separate STUCK_TERMINATE_DELAY 
which is used to terminate the pipeline.  It seems inconsistent to cause the 
pipeline to fail with the warning timeout.  It is also not 100% accurrate 
because the pipeline may be busy processing stages other than the input or 
output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=171743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171743
 ]

ASF GitHub Bot logged work on BEAM-5514:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:51
Start Date: 03/Dec/18 22:51
Worklog Time Spent: 10m 
  Work Description: ihji opened a new pull request #7189: [BEAM-5514] 
BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189
 
 
 - add a handler for quotaExceeded error
 - add a rate limiter that dynamically adjusts insertAll
 submission rate below a given rate limit quota
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171743)
Time Spent: 10m
Remaining Estimate: 0h

> BigQueryIO doesn't ha

[jira] [Work logged] (BEAM-2939) Fn API streaming SDF support

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=171741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171741
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:42
Start Date: 03/Dec/18 22:42
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #7177: [BEAM-2939] Add 
support for backlog reporting to byte key and offset restriction trackers.
URL: https://github.com/apache/beam/pull/7177
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
index db83af4c9765..14216c194d56 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
@@ -24,6 +24,7 @@
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.MoreObjects;
 import com.google.common.primitives.Bytes;
+import java.math.BigDecimal;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.io.range.ByteKey;
 import org.apache.beam.sdk.io.range.ByteKeyRange;
@@ -36,7 +37,8 @@
  * Note, one can complete a range by claiming the {@link ByteKey#EMPTY} 
once one runs out of keys
  * to process.
  */
-public class ByteKeyRangeTracker extends RestrictionTracker {
+public class ByteKeyRangeTracker extends RestrictionTracker
+implements Backlogs.HasBacklog {
   /* An empty range which contains no keys. */
   @VisibleForTesting
   static final ByteKeyRange NO_KEYS = ByteKeyRange.of(ByteKey.EMPTY, 
ByteKey.of(0x00));
@@ -180,4 +182,30 @@ static ByteKey next(ByteKey key) {
   }
 
   private static final byte[] ZERO_BYTE_ARRAY = new byte[] {0};
+
+  @Override
+  public Backlog getBacklog() {
+// Return 0 for the empty range which is implicitly done.
+// This case can occur if the range tracker is checkpointed before any 
keys have been claimed
+// or if the range tracker is checkpointed once the range is done.
+if (NO_KEYS.equals(range)) {
+  return Backlog.of(BigDecimal.ZERO);
+}
+
+// If we are attempting to get the backlog without processing a single 
key, we return 1.0
+if (lastAttemptedKey == null) {
+  return Backlog.of(BigDecimal.ONE);
+}
+
+// Return 0 if the last attempted key was the empty key representing the 
end of range for
+// all ranges or the last attempted key is beyond the end of the range.
+if (lastAttemptedKey.isEmpty()
+|| !(range.getEndKey().isEmpty() || 
range.getEndKey().compareTo(lastAttemptedKey) > 0)) {
+  return Backlog.of(BigDecimal.ZERO);
+}
+
+// TODO: Use the ability of BigDecimal's additional precision to more 
accurately report backlog
+// for keys which are long.
+return 
Backlog.of(BigDecimal.valueOf(range.estimateFractionForKey(lastAttemptedKey)));
+  }
 }
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
index 2d5626b8d7c3..a8287edf54ca 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
@@ -22,15 +22,16 @@
 import static com.google.common.base.Preconditions.checkState;
 
 import com.google.common.base.MoreObjects;
+import java.math.BigDecimal;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.io.range.OffsetRange;
-import org.apache.beam.sdk.transforms.DoFn;
 
 /**
  * A {@link RestrictionTracker} for claiming offsets in an {@link OffsetRange} 
in a monotonically
  * increasing fashion.
  */
-public class OffsetRangeTracker extends RestrictionTracker {
+public class OffsetRangeTracker extends RestrictionTracker
+implements Backlogs.HasBacklog {
   private OffsetRange range;
   @Nullable private Long lastClaimedOffset = null;
   @Nullable private Long lastAttemptedOffset = null;
@@ -79,17 +80,6 @@ public boolean tryClaim(Long i) {
 return true;
   }
 
-  /**
-   * Marks that there are no more offsets to be claimed in the range.
-   *
-   * E.g., a {@link DoFn} reading a file and claiming the offset of each 
record in the file might
-   * call this if it hits EOF - even though the last attempted claim was 
before the end of the
-   * range, there are no more offset

[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171740
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:40
Start Date: 03/Dec/18 22:40
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238466784
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java
 ##
 @@ -77,12 +82,72 @@ public FlinkMetricContainer(RuntimeContext runtimeContext) 
{
 this.metricsAccumulator = (MetricsAccumulator) metricsAccumulator;
   }
 
-  MetricsContainer getMetricsContainer(String stepName) {
+  public MetricsContainer getMetricsContainer(String stepName) {
 return metricsAccumulator != null
 ? metricsAccumulator.getLocalValue().getContainer(stepName)
 : null;
   }
 
+  /**
+   * Parse a {@link MetricName} from a {@link
+   * org.apache.beam.model.fnexecution.v1.BeamFnApi.MonitoringInfoUrns.Enum}
+   *
+   * Should be consistent with {@code parse_namespace_and_name} in 
monitoring_infos.py
 
 Review comment:
   Wanted to flag this for @ajamato in particular as the author of [the python 
code I mimicked 
here](https://github.com/apache/beam/blob/6b7cf422733a82aabb8e257e975d9e9c5e785376/sdks/python/apache_beam/metrics/monitoring_infos.py#L228-L236).
   
   How important is it that the python and java SDKs parse metric URNs into a 
{namespace, name} pairs in the same way?
   
   From what I can tell, It may not be critical, though we should still 
probably agree on one semantic interpretation of what is "namespace" vs "name" 
of e.g. 
[`"beam:metric:element_count:v1"`](https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/beam_fn_api.proto#L360-L361).
   
   With `monitoring_infos.py` as the reference implementation, the answer is 
{`"beam"`, `"metric:element_count:v1"`}; shall I encode that in a comment in 
`beam_fn_api.proto`, and reference it here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171740)
Time Spent: 3h 10m  (was: 3h)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171738
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:38
Start Date: 03/Dec/18 22:38
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #7168: [BEAM-6160] Use 
service server rather than service
URL: https://github.com/apache/beam/pull/7168
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
index b103dcef4b2c..2666dea61115 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java
@@ -336,9 +336,9 @@ boolean doWork(WorkItem workItem, WorkItemStatusClient 
workItemStatusClient) thr
 worker =
 mapTaskExecutorFactory.create(
 sdkWorkerHarness.getControlClientHandler(),
-sdkWorkerHarness.getDataService(),
+sdkWorkerHarness.getGrpcDataFnServer(),
 sdkHarnessRegistry.beamFnDataApiServiceDescriptor(),
-sdkWorkerHarness.getStateService(),
+sdkWorkerHarness.getGrpcStateFnServer(),
 network,
 options,
 stageName,
diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java
index 067b221fabc7..8e191e2654a4 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BeamFnMapTaskExecutorFactory.java
@@ -55,6 +55,7 @@
 import org.apache.beam.runners.dataflow.worker.counters.NameContext;
 import 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor;
 import 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation;
+import org.apache.beam.runners.dataflow.worker.fn.data.BeamFnDataGrpcService;
 import 
org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortReadOperation;
 import 
org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation;
 import org.apache.beam.runners.dataflow.worker.graph.Edges.Edge;
@@ -82,8 +83,10 @@
 import org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver;
 import org.apache.beam.runners.dataflow.worker.util.common.worker.Sink;
 import 
org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
 import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
 import org.apache.beam.runners.fnexecution.data.FnDataService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
 import org.apache.beam.runners.fnexecution.state.StateDelegator;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.KvCoder;
@@ -115,9 +118,9 @@ private BeamFnMapTaskExecutorFactory() {}
   @Override
   public DataflowMapTaskExecutor create(
   InstructionRequestHandler instructionRequestHandler,
-  FnDataService beamFnDataService,
+  GrpcFnServer grpcDataFnServer,
   Endpoints.ApiServiceDescriptor dataApiServiceDescriptor,
-  StateDelegator beamFnStateDelegator,
+  GrpcFnServer grpcStateFnServer,
   MutableNetwork network,
   PipelineOptions options,
   String stageName,
@@ -143,7 +146,7 @@ public DataflowMapTaskExecutor create(
 createOperationTransformForRegisterFnNodes(
 idGenerator,
 instructionRequestHandler,
-beamFnStateDelegator,
+grpcStateFnServer.getService(),
 stageName,
 executionContext));
 
@@ -153,7 +156,7 @@ public DataflowMapTaskExecutor create(
 network,
 createOperationTransformForGrpcPortNodes(
 network,
-beamFnDataService,
+grpcDataFnServer.getService(),
 // TODO: Set NameContext properly for these operations.
 executionContext.createOperationContext(
 

[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171732
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:24
Start Date: 03/Dec/18 22:24
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238462220
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py
 ##
 @@ -59,25 +65,63 @@
   environment_type = known_args.environment_type.lower()
   environment_config = (
   known_args.environment_config if known_args.environment_config else None)
+  test = known_args.test
 
   # This is defined here to only be run when we invoke this file explicitly.
   class FlinkRunnerTest(portable_runner_test.PortableRunnerTest):
 _use_grpc = True
 _use_subprocesses = True
 
+conf_dir = None
+
+@classmethod
+def tearDownClass(cls):
+  if cls.conf_dir and exists(cls.conf_dir):
+logging.info("removing conf dir: %s" % cls.conf_dir)
+rmtree(cls.conf_dir)
+  super(FlinkRunnerTest, cls).tearDownClass()
+
+@classmethod
+def _create_conf_dir(cls):
+  """Create (and save a static reference to) a "conf dir", used to provide 
metrics configs and
+   verify metrics output
+
+   It gets cleaned up when the suite is done executing"""
+
+  if hasattr(cls, 'conf_dir'):
+cls.conf_dir = mkdtemp(prefix='flinktest-conf')
+
+# path for a FileReporter to write metrics to
+cls.test_metrics_path = path.join(cls.conf_dir, 'test-metrics.txt')
+
+# path to write Flink configuration to
+conf_path = path.join(cls.conf_dir, 'flink-conf.yaml')
+with open(conf_path, 'w') as f:
+  f.write(linesep.join([
+'metrics.reporters: test',
+'metrics.reporter.test.class: 
org.apache.beam.runners.flink.metrics.FileReporter',
+'metrics.reporter.test.file: %s' % cls.test_metrics_path
+  ]))
+
 @classmethod
 def _subprocess_command(cls, port):
-  tmp_dir = tempfile.mkdtemp(prefix='flinktest')
+  # will be cleaned up at the end of this method, and recreated and used 
by the job server
+  tmp_dir = mkdtemp(prefix='flinktest')
+
+  cls._create_conf_dir()
+
   try:
 return [
 'java',
 '-jar', flink_job_server_jar,
+'--flink-master-url', '[local]',
 
 Review comment:
   hm I think you pasted the wrong link @mxm? afaict you just linked to this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171732)
Time Spent: 3h  (was: 2h 50m)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171731
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:23
Start Date: 03/Dec/18 22:23
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238461991
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java
 ##
 @@ -77,12 +82,72 @@ public FlinkMetricContainer(RuntimeContext runtimeContext) 
{
 this.metricsAccumulator = (MetricsAccumulator) metricsAccumulator;
   }
 
-  MetricsContainer getMetricsContainer(String stepName) {
+  public MetricsContainer getMetricsContainer(String stepName) {
 return metricsAccumulator != null
 ? metricsAccumulator.getLocalValue().getContainer(stepName)
 : null;
   }
 
+  /**
+   * Parse a {@link MetricName} from a {@link
+   * org.apache.beam.model.fnexecution.v1.BeamFnApi.MonitoringInfoUrns.Enum}
+   *
+   * Should be consistent with {@code parse_namespace_and_name} in 
monitoring_infos.py
+   *
+   * TODO: not flink-specific; where should it live?
 
 Review comment:
   not sure I follow; how could I put this `String ⇒ MetricName` logic into 
`BeamFnApi.MonitoringInfo`? The latter is codegen'd from proto definitions…


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171731)
Time Spent: 2h 50m  (was: 2h 40m)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171730
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:22
Start Date: 03/Dec/18 22:22
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238461420
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink.metrics;
+
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.PrintStream;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.metrics.MetricConfig;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.metrics.reporter.AbstractReporter;
+
+/**
+ * Flink {@link org.apache.flink.metrics.reporter.MetricReporter metrics 
reporter} for writing
+ * metrics to a file (specified via the "metrics.reporter.test.file" config 
key).
+ */
+public class FileReporter extends AbstractReporter {
 
 Review comment:
   I originally gave it a name like that, however I talked myself in to that 
this is a potentially more general-purpose abstraction, and it is Flink 
specific, so that this (`main`, not `test`) package is an OK place for it.
   
   If it was `TestFileReporter` then I'd feel obliged to put it in the test 
package, at which point the build plumbing to start a job-server in 
`flink_runner_test.py` that has the :beam-runners-flink` `-tests.jar` on the 
classpath would become more convoluted, so that was an added incentive to view 
this as not a "tests only" thing 😀
   
   lmk what you think!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171730)
Time Spent: 2h 40m  (was: 2.5h)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6169) Dataflow runner: rateLimit exceeded from BigQuery API in start_bundle

2018-12-03 Thread Thomas Pilewicz (JIRA)
Thomas Pilewicz created BEAM-6169:
-

 Summary: Dataflow runner: rateLimit exceeded from BigQuery API in 
start_bundle
 Key: BEAM-6169
 URL: https://issues.apache.org/jira/browse/BEAM-6169
 Project: Beam
  Issue Type: Bug
  Components: gcp-quota
Affects Versions: 2.8.0
 Environment: Python SDK 2.8, Dataflow runner
Reporter: Thomas Pilewicz
Assignee: Rafael Fernandez


When many machines, or too big machines are used, BigQuery returns rateLimit 
exceeded errors in response to the GET requests from 
BigQueryWriteFn.start_bundle.

Unsure if this causes performance issues, or is just noise. (Produced with a 
pipeline reading strings from Pub/Sub that are ~1000 characters long, and 
writing directly to BigQuery)

{{... File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
1087, in get_or_create_table found_table = self._get_table(project_id, 
dataset_id, table_id) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, 
in wrapper return fun(*args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
925, in _get_table response = self.client.tables.Get(request) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
 line 611, in Get config, request, global_params=global_params) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
722, in _RunMethod return self.ProcessHttpResponse(method_config, 
http_response, request) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
728, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, 
http_response, request)) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
599, in __ProcessHttpResponse http_response, method_config=method_config, 
request=request) HttpForbiddenError: HttpError accessing 
:
 response: <{'status': '403', 'content-length': '577', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'expires': 'Sun, 25 Nov 2018 14:36:24 GMT', 'vary': 'Origin, 
X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 
'private, max-age=0', 'date': 'Sun, 25 Nov 2018 14:36:24 GMT', 
'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{ "error": { "errors": [ { "domain": "global", 
"reason": "rateLimitExceeded", "message": "Exceeded rate limits: Your 
user_method exceeded quota for api requests per user per method. For more 
information, see https://cloud.google.com/bigquery/troubleshooting-errors";, 
"locationType": "other", "location": "helix_api.method_request" } ], "code": 
403, "message": "Exceeded rate limits: Your user_method exceeded quota for api 
requests per user per method. For more information, see 
https://cloud.google.com/bigquery/troubleshooting-errors"}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6165) Send metrics to Flink in portable Flink runner

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6165?focusedWorklogId=171729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171729
 ]

ASF GitHub Bot logged work on BEAM-6165:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:19
Start Date: 03/Dec/18 22:19
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on a change in pull request 
#7183: [BEAM-6165] send metrics to Flink in portable Flink runner
URL: https://github.com/apache/beam/pull/7183#discussion_r238460390
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py
 ##
 @@ -65,19 +69,58 @@ class 
FlinkRunnerTest(portable_runner_test.PortableRunnerTest):
 _use_grpc = True
 _use_subprocesses = True
 
+conf_dir = None
+
+@classmethod
+def tearDownClass(cls):
+  if cls.conf_dir and exists(cls.conf_dir):
+logging.info("removing conf dir: %s" % cls.conf_dir)
+rmtree(cls.conf_dir)
+  super(FlinkRunnerTest, cls).tearDownClass()
+
+@classmethod
+def _create_conf_dir(cls):
+  """Create (and save a static reference to) a "conf dir", used to provide
 
 Review comment:
   good idea, though I am also using this directory as the tempdir that holds 
the output metrics; I think I'd still need to do that and it would look the 
same as this, but perhaps with a different name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171729)
Time Spent: 2.5h  (was: 2h 20m)

> Send metrics to Flink in portable Flink runner
> --
>
> Key: BEAM-6165
> URL: https://issues.apache.org/jira/browse/BEAM-6165
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Major
>  Labels: metrics, portability, portability-flink
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Metrics are sent from the fn harness to runner in the Python SDK (and likely 
> Java soon), but the portable Flink runner doesn't pass them on to Flink, 
> which it should, so that users can see them in e.g. the Flink UI or via any 
> Flink metrics reporters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5978) portableWordCount gradle task not working

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5978?focusedWorklogId=171726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171726
 ]

ASF GitHub Bot logged work on BEAM-5978:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:16
Start Date: 03/Dec/18 22:16
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7174: [BEAM-5978] Changing 
parallelism for wordcount to 1
URL: https://github.com/apache/beam/pull/7174#issuecomment-443891281
 
 
   Go ahead :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171726)
Time Spent: 5h 40m  (was: 5.5h)

> portableWordCount gradle task not working
> -
>
> Key: BEAM-5978
> URL: https://issues.apache.org/jira/browse/BEAM-5978
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> A few issues with portableWordCount gradle task when running with jobserver 
> docker image
>  # Jobserver docker image docker mount fails on linux.
>  # Docker can not write read and write to local file system.
>  # Docker from jobserver docker container requires libltdl7 lib on linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171723
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 22:07
Start Date: 03/Dec/18 22:07
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7168: [BEAM-6160] Use 
service server rather than service
URL: https://github.com/apache/beam/pull/7168#issuecomment-443888769
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171723)
Time Spent: 2.5h  (was: 2h 20m)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1628?focusedWorklogId=171717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171717
 ]

ASF GitHub Bot logged work on BEAM-1628:


Author: ASF GitHub Bot
Created on: 03/Dec/18 21:34
Start Date: 03/Dec/18 21:34
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #7187: 
[BEAM-1628] Allow empty port for flink master url
URL: https://github.com/apache/beam/pull/7187#discussion_r238445475
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -77,17 +76,23 @@ static ExecutionEnvironment 
createBatchExecutionEnvironment(
   flinkBatchEnv = new CollectionEnvironment();
 } else if ("[auto]".equals(masterUrl)) {
   flinkBatchEnv = ExecutionEnvironment.getExecutionEnvironment();
-} else if (masterUrl.matches(".*:\\d*")) {
-  List parts = Splitter.on(':').splitToList(masterUrl);
+} else {
+  String[] hostAndPort = masterUrl.split(":", 2);
+  final String host = hostAndPort[0];
+  final int port;
+  if (hostAndPort.length > 1) {
+port = Integer.parseInt(hostAndPort[1]);
 
 Review comment:
   Shall we catch the integer parsing exception and log 
   ` LOG.warn("Unrecognized Flink Master URL {}", masterUrl);   `


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171717)
Time Spent: 40m  (was: 0.5h)

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5978) portableWordCount gradle task not working

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5978?focusedWorklogId=171718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171718
 ]

ASF GitHub Bot logged work on BEAM-5978:


Author: ASF GitHub Bot
Created on: 03/Dec/18 21:39
Start Date: 03/Dec/18 21:39
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7174: [BEAM-5978] Changing 
parallelism for wordcount to 1
URL: https://github.com/apache/beam/pull/7174#issuecomment-443880235
 
 
   Shall we merge this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171718)
Time Spent: 5.5h  (was: 5h 20m)

> portableWordCount gradle task not working
> -
>
> Key: BEAM-5978
> URL: https://issues.apache.org/jira/browse/BEAM-5978
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> A few issues with portableWordCount gradle task when running with jobserver 
> docker image
>  # Jobserver docker image docker mount fails on linux.
>  # Docker can not write read and write to local file system.
>  # Docker from jobserver docker container requires libltdl7 lib on linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171712
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 21:20
Start Date: 03/Dec/18 21:20
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7168: [BEAM-6160] Use 
service server rather than service
URL: https://github.com/apache/beam/pull/7168#issuecomment-443874563
 
 
   @lukecwik All comments above have been addressed in commit 
https://github.com/apache/beam/pull/7168/commits/b62cf4d90ad7646ca49348ad2302b248588ff984.
 Could you please take another look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171712)
Time Spent: 2h 20m  (was: 2h 10m)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1628?focusedWorklogId=171714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171714
 ]

ASF GitHub Bot logged work on BEAM-1628:


Author: ASF GitHub Bot
Created on: 03/Dec/18 21:22
Start Date: 03/Dec/18 21:22
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7187: [BEAM-1628] Allow 
empty port for flink master url
URL: https://github.com/apache/beam/pull/7187#issuecomment-443875054
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171714)
Time Spent: 0.5h  (was: 20m)

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6168) Allow modification of JSON value before writing to ElasticSearch

2018-12-03 Thread Mark Norkin (JIRA)
Mark Norkin created BEAM-6168:
-

 Summary: Allow modification of JSON value before writing to 
ElasticSearch
 Key: BEAM-6168
 URL: https://issues.apache.org/jira/browse/BEAM-6168
 Project: Beam
  Issue Type: Improvement
  Components: io-java-elasticsearch
Reporter: Mark Norkin
Assignee: Etienne Chauchot


I have an Apache Beam streaming job which reads data from Kafka and writes to 
ElasticSearch using ElasticSearchIO.

The issue I'm having is that messages in Kafka already have _{{key}}_ field, 
and using {{ElasticSearchIO.Write.withIdFn()}} I'm mapping this field to 
document _{{_id}}_ field in ElasticSearch.

Having a big volume of data I don't want the _{{key}}_ field to be also written 
to ElasticSearch as a separate field and as part of _{{_source}}_.

Is there an option/workaround that would allow doing that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171703
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:46
Start Date: 03/Dec/18 20:46
Worklog Time Spent: 10m 
  Work Description: reuvenlax closed pull request #7181: [BEAM-4454] Add 
more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
index b0a76976d15f..8b9f182d0848 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
@@ -20,7 +20,9 @@
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.HashMap;
@@ -28,16 +30,67 @@
 import java.util.Map;
 import java.util.stream.Collectors;
 import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.avro.Conversions;
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema.Type;
 import org.apache.avro.generic.GenericEnumSymbol;
 import org.apache.avro.generic.GenericFixed;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.util.Utf8;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
 import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.joda.time.ReadableInstant;
 
 /** Utils to convert AVRO records to Beam rows. */
 @Experimental(Experimental.Kind.SCHEMAS)
 public class AvroUtils {
+  // Unwrap an AVRO schema into the base type an whether it is nullable.
+  static class TypeWithNullability {
+public final org.apache.avro.Schema type;
+public final boolean nullable;
+
+TypeWithNullability(org.apache.avro.Schema avroSchema) {
+  if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
+List types = avroSchema.getTypes();
+
+// optional fields in AVRO have form of:
+// {"name": "foo", "type": ["null", "something"]}
+
+// don't need recursion because nested unions aren't supported in AVRO
+List nonNullTypes =
+types
+.stream()
+.filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
+.collect(Collectors.toList());
+
+if (nonNullTypes.size() == types.size() || nonNullTypes.isEmpty()) {
+  // union without `null` or all 'null' union, keep as is.
+  type = avroSchema;
+  nullable = false;
+} else if (nonNullTypes.size() > 1) {
+  type = org.apache.avro.Schema.createUnion(nonNullTypes);
+  nullable = true;
+} else {
+  // One non-null type.
+  type = nonNullTypes.get(0);
+  nullable = true;
+}
+  } else {
+type = avroSchema;
+nullable = false;
+  }
+}
+  }
+
   private AvroUtils() {}
 
   /**
@@ -45,94 +98,332 @@ private AvroUtils() {}
*
* @param schema schema of type RECORD
*/
-  public static Schema toSchema(@Nonnull org.apache.avro.Schema schema) {
+  public static Schema toBeamSchema(org.apache.avro.Schema schema) {
 Schema.Builder builder = Schema.builder();
 
 for (org.apache.avro.Schema.Field field : schema.getFields()) {
-  org.apache.avro.Schema unwrapped = unwrapNullableSchema(field.schema());
-
-  if (!unwrapped.equals(field.schema())) {
-builder.addNullableField(field.name(), toFieldType(unwrapped));
-  } else {
-builder.addField(field.name(), toFieldType(unwrapped));
+  TypeWithNullability nullableType = new 
TypeWithNullability(field.schema());
+  Field beamField = Field.of(field.name(), toFieldType(nullableType));
+  if (field.doc() != null) {
+beamField = beamField.withDescription(field.doc());
 

[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171698
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:22
Start Date: 03/Dec/18 20:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7168: 
[BEAM-6160] Use service server rather than service
URL: https://github.com/apache/beam/pull/7168#discussion_r238420120
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java
 ##
 @@ -40,21 +40,13 @@ public static InProcessServerFactory create() {
   private InProcessServerFactory() {}
 
   @Override
-  public Server allocatePortAndCreate(BindableService service, 
ApiServiceDescriptor.Builder builder)
-  throws IOException {
+  public Server allocatePortAndCreate(
+  List services, ApiServiceDescriptor.Builder builder) 
throws IOException {
 String name = String.format("InProcessServer_%s", 
serviceNameUniqifier.getAndIncrement());
 builder.setUrl(name);
-return 
InProcessServerBuilder.forName(name).addService(service).build().start();
-  }
-
-  @Override
-  public Server create(BindableService service, ApiServiceDescriptor 
serviceDescriptor)
-  throws IOException {
-return InProcessServerBuilder.forName(serviceDescriptor.getUrl())
-.addService(
-ServerInterceptors.intercept(service, 
GrpcContextHeaderAccessorProvider.interceptor()))
-.build()
-.start();
+InProcessServerBuilder serverBuilder = 
InProcessServerBuilder.forName(name);
+services.stream().forEach(service -> serverBuilder.addService(service));
 
 Review comment:
   You need to add the GrpcContextHeaderAccessorProvider


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171698)
Time Spent: 1h 40m  (was: 1.5h)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171701
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:22
Start Date: 03/Dec/18 20:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7168: 
[BEAM-6160] Use service server rather than service
URL: https://github.com/apache/beam/pull/7168#discussion_r238419493
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/GrpcFnServer.java
 ##
 @@ -43,12 +44,50 @@
*/
   public static  GrpcFnServer create(
   ServiceT service, ApiServiceDescriptor endpoint, ServerFactory factory) 
throws IOException {
-return new GrpcFnServer<>(factory.create(service, endpoint), service, 
endpoint);
+return new GrpcFnServer<>(
+factory.create(ImmutableList.of(service), endpoint), service, 
endpoint);
   }
 
+  /** @deprecated This create function is used for Dataflow migration purpose 
only. */
+  @Deprecated
   public static  GrpcFnServer create(
   ServiceT service, ApiServiceDescriptor endpoint) {
-return new GrpcFnServer<>(null, service, endpoint);
+Server fakeServer =
 
 Review comment:
   I was thinking you would create an anonymous subclass of GrpcFnServer, not 
of the gRPC Server class where calling close does nothing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171701)
Time Spent: 2h 10m  (was: 2h)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171699
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:22
Start Date: 03/Dec/18 20:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7168: 
[BEAM-6160] Use service server rather than service
URL: https://github.com/apache/beam/pull/7168#discussion_r238421400
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java
 ##
 @@ -153,22 +153,109 @@ private static Server 
createServer(List services, InetSocketAdd
   service, 
GrpcContextHeaderAccessorProvider.interceptor(;
   return builder.build().start();
 }
+  }
+
+  /**
+   * Creates a {@link Server gRPC Server} using a Unix domain socket. Note 
that this requires http://netty.io/wiki/forked-tomcat-native.html";>Netty TcNative 
available to be able
+   * to provide a {@link EpollServerDomainSocketChannel}.
+   *
+   * The unix domain socket is located at 
${java.io.tmpdir}/fnapi${random[0-1)}.sock
+   */
+  private static class EpollDomainSocket extends ServerFactory {
+private static File getFileForPort(int port) {
+  return new File(System.getProperty("java.io.tmpdir"), 
String.format("fnapi%d.sock", port));
+}
 
-private static Server createServer(BindableService service, 
InetSocketAddress socket)
+@Override
+public Server allocatePortAndCreate(
 
 Review comment:
   Note that "allocatePort" is the wrong terminology here.
   
   Either in this PR or in a follow-up please rename this method to 
allocateAddressAndCreate and update the javadoc comment for the ServerFactory 
along the lines that the allocation is server type dependent, e.g. it could be 
a port for some servers and for others it may be a file path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171699)
Time Spent: 1h 50m  (was: 1h 40m)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171700
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:22
Start Date: 03/Dec/18 20:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #7168: 
[BEAM-6160] Use service server rather than service
URL: https://github.com/apache/beam/pull/7168#discussion_r238422004
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java
 ##
 @@ -153,22 +153,109 @@ private static Server 
createServer(List services, InetSocketAdd
   service, 
GrpcContextHeaderAccessorProvider.interceptor(;
   return builder.build().start();
 }
+  }
+
+  /**
+   * Creates a {@link Server gRPC Server} using a Unix domain socket. Note 
that this requires http://netty.io/wiki/forked-tomcat-native.html";>Netty TcNative 
available to be able
+   * to provide a {@link EpollServerDomainSocketChannel}.
+   *
+   * The unix domain socket is located at 
${java.io.tmpdir}/fnapi${random[0-1)}.sock
+   */
+  private static class EpollDomainSocket extends ServerFactory {
+private static File getFileForPort(int port) {
 
 Review comment:
   rename to `chooseRandomTmpFile`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171700)
Time Spent: 2h  (was: 1h 50m)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6167) Create a Class to read content of a file keeping track of the file path (python)

2018-12-03 Thread Lorenzo Caggioni (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorenzo Caggioni updated BEAM-6167:
---
Summary: Create a Class to read content of a file keeping track of the file 
path (python)  (was: Create a Class to read content of a file kipping track of 
the file path (python))

> Create a Class to read content of a file keeping track of the file path 
> (python)
> 
>
> Key: BEAM-6167
> URL: https://issues.apache.org/jira/browse/BEAM-6167
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.8.0
>Reporter: Lorenzo Caggioni
>Assignee: Eugene Kirpichov
>Priority: Minor
> Fix For: Not applicable
>
>
> Add a class to read content of a file keeping track of the file path each 
> element come from.
> This is an improvement of the current python/apache_beam/io/textio.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5978) portableWordCount gradle task not working

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5978?focusedWorklogId=171690&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171690
 ]

ASF GitHub Bot logged work on BEAM-5978:


Author: ASF GitHub Bot
Created on: 03/Dec/18 20:00
Start Date: 03/Dec/18 20:00
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7174: [BEAM-5978] Changing 
parallelism for wordcount to 1
URL: https://github.com/apache/beam/pull/7174#issuecomment-443849713
 
 
   @mxm The underlying issue of flink and sdkHarness scheduling is still 
pending and requires more thought and discussion.
   
   @tweise Mac wordcount should be working with 
https://github.com/apache/beam/pull/7178 Let me know if its otherwise.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171690)
Time Spent: 5h 20m  (was: 5h 10m)

> portableWordCount gradle task not working
> -
>
> Key: BEAM-5978
> URL: https://issues.apache.org/jira/browse/BEAM-5978
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> A few issues with portableWordCount gradle task when running with jobserver 
> docker image
>  # Jobserver docker image docker mount fails on linux.
>  # Docker can not write read and write to local file system.
>  # Docker from jobserver docker container requires libltdl7 lib on linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171667
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:15
Start Date: 03/Dec/18 18:15
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238375327
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -20,119 +20,412 @@
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
 import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.avro.Conversions;
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema.Type;
 import org.apache.avro.generic.GenericEnumSymbol;
 import org.apache.avro.generic.GenericFixed;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.util.Utf8;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
 import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.joda.time.ReadableInstant;
 
 /** Utils to convert AVRO records to Beam rows. */
 @Experimental(Experimental.Kind.SCHEMAS)
 public class AvroUtils {
+  // Unwrap an AVRO schema into the base type an whether it is nullable.
+  static class TypeWithNullability {
+public final org.apache.avro.Schema type;
+public final boolean nullable;
+
+TypeWithNullability(org.apache.avro.Schema avroSchema) {
+  if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
+List types = avroSchema.getTypes();
+
+// optional fields in AVRO have form of:
+// {"name": "foo", "type": ["null", "something"]}
+
+// don't need recursion because nested unions aren't supported in AVRO
+List nonNullTypes =
+types
+.stream()
+.filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
+.collect(Collectors.toList());
+
+if (nonNullTypes.size() == types.size() || nonNullTypes.isEmpty()) {
+  // union without `null` or all 'null' union, keep as is.
+  type = avroSchema;
+  nullable = false;
+} else if (nonNullTypes.size() > 1) {
+  type = org.apache.avro.Schema.createUnion(nonNullTypes);
+  nullable = true;
+} else {
+  // One non-null type.
+  type = nonNullTypes.get(0);
+  nullable = true;
+}
+  } else {
+type = avroSchema;
+nullable = false;
+  }
+}
+  }
+
   private AvroUtils() {}
 
   /**
* Converts AVRO schema to Beam row schema.
*
* @param schema schema of type RECORD
*/
-  public static Schema toSchema(@Nonnull org.apache.avro.Schema schema) {
+  public static Schema toBeamSchema(org.apache.avro.Schema schema) {
 Schema.Builder builder = Schema.builder();
 
 for (org.apache.avro.Schema.Field field : schema.getFields()) {
-  org.apache.avro.Schema unwrapped = unwrapNullableSchema(field.schema());
-
-  if (!unwrapped.equals(field.schema())) {
-builder.addNullableField(field.name(), toFieldType(unwrapped));
-  } else {
-builder.addField(field.name(), toFieldType(unwrapped));
+  TypeWithNullability nullableType = new 
TypeWithNullability(field.schema());
+  Field beamField = Field.of(field.name(), toFieldType(nullableType));
+  if (field.doc() != null) {
+beamField = beamField.withDescription(field.doc());
   }
+  builder.addField(beamField);
 }
 
 return builder.build();
   }
 
-  /** Converts AVRO schema to Beam field. */
-  public static Schema.FieldType toFieldType(@Nonnull org.apache.avro.Schema 
avroSchema) {
-switch (avroSchema.getType()) {
-  case RECORD:
-return Schema.FieldType.row(toSchema(avroSchema));
+  /** Converts a Beam Schema into an AVRO schema. */
+  public static org.apache.avro.Schema toAvroSchema(Schema beamSchema) {
+List fields =

[jira] [Work logged] (BEAM-6160) Simplify service and server creation in dataflow runner harness

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6160?focusedWorklogId=171686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171686
 ]

ASF GitHub Bot logged work on BEAM-6160:


Author: ASF GitHub Bot
Created on: 03/Dec/18 19:29
Start Date: 03/Dec/18 19:29
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7168: [BEAM-6160] Use 
service server rather than service
URL: https://github.com/apache/beam/pull/7168#issuecomment-443838805
 
 
   @lukecwik All comments above have been addressed. Could you please take 
another look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171686)
Time Spent: 1.5h  (was: 1h 20m)

> Simplify service and server creation in dataflow runner harness
> ---
>
> Key: BEAM-6160
> URL: https://issues.apache.org/jira/browse/BEAM-6160
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=171679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171679
 ]

ASF GitHub Bot logged work on BEAM-5859:


Author: ASF GitHub Bot
Created on: 03/Dec/18 19:10
Start Date: 03/Dec/18 19:10
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #7150: [BEAM-5859] Improve 
operator names for portable pipelines
URL: https://github.com/apache/beam/pull/7150
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
index c9bb4f6ea030..09d1a991c279 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java
@@ -39,4 +39,25 @@ public static ExecutableStagePayload 
getExecutableStagePayload(
 checkArgument(ExecutableStage.URN.equals(transform.getSpec().getUrn()));
 return ExecutableStagePayload.parseFrom(transform.getSpec().getPayload());
   }
+
+  public static String generateNameFromStagePayload(ExecutableStagePayload 
stagePayload) {
+StringBuilder sb = new StringBuilder();
+RunnerApi.Components components = stagePayload.getComponents();
+final int transformsCount = stagePayload.getTransformsCount();
+sb.append("[").append(transformsCount).append("]");
+sb.append("{");
+for (int i = 0; i < transformsCount; i++) {
+  String name = 
components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName();
+  // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes 
the name longer
+  name = name.replaceFirst("^ref_AppliedPTransform_", "");
+  // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the 
name longer
+  name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", "");
+  sb.append(name);
+  if (i + 1 < transformsCount) {
+sb.append(", ");
+  }
+}
+sb.append("}");
+return sb.toString();
+  }
 }
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
new file mode 100644
index ..c5b06f76db04
--- /dev/null
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import static org.hamcrest.core.Is.is;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.junit.Test;
+
+/** Tests for {@link ExecutableStageTranslation}. */
+public class ExecutableStageTranslationTest implements Serializable {
+
+  @Test
+  /* Test for generating readable operator names during translation. */
+  public void testOperatorNameGeneration() throws Exception {
+Pipeline p = Pipeline.create();
+p.apply(Impulse.create())
+// Anonymous ParDo
+.apply(
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public 

[jira] [Work logged] (BEAM-6139) Support for GEOGRAPHY data type in BQ sources

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6139?focusedWorklogId=171673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171673
 ]

ASF GitHub Bot logged work on BEAM-6139:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:53
Start Date: 03/Dec/18 18:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7144: [BEAM-6139] Adding 
support for BQ GEOGRAPHY data type
URL: https://github.com/apache/beam/pull/7144#issuecomment-443825298
 
 
   Because the Java SDK performs avro exports, this feature is blocked for Java 
until BQ Avro exports support WKT. I'd like to go forward only with Python for 
now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171673)
Time Spent: 40m  (was: 0.5h)

> Support for GEOGRAPHY data type in BQ sources
> -
>
> Key: BEAM-6139
> URL: https://issues.apache.org/jira/browse/BEAM-6139
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=171671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171671
 ]

ASF GitHub Bot logged work on BEAM-5058:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:35
Start Date: 03/Dec/18 18:35
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7163: [BEAM-5058] Run 
basic ITs in Python Precommit in parallel
URL: https://github.com/apache/beam/pull/7163#issuecomment-443818622
 
 
   PTAL @udim 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171671)
Time Spent: 5.5h  (was: 5h 20m)

> Python precommits should run E2E tests
> --
>
> Key: BEAM-5058
> URL: https://issues.apache.org/jira/browse/BEAM-5058
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> According to [https://beam.apache.org/contribute/testing/] (which I'm working 
> on), end-to-end tests should be run in precommit on each combination of 
> \{batch, streaming}x\{SDK language}x\{supported runner}.
> At least 2 tests need to be added to Python's precommit: wordcount and 
> wordcount_streaming on Dataflow, and possibly on other supported runners 
> (direct runner and new runners plz).
>  These tests should be configured to run from a Gradle sub-project, so that 
> they're run in parallel to the unit tests.
> Example that parallelizes Java precommit integration tests: 
> [https://github.com/apache/beam/pull/5731]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5920) Add additional OWNERS

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5920?focusedWorklogId=171670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171670
 ]

ASF GitHub Bot logged work on BEAM-5920:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:34
Start Date: 03/Dec/18 18:34
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7186: [BEAM-5920] Add 
additional owners for Community Metrics
URL: https://github.com/apache/beam/pull/7186#issuecomment-443818005
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171670)
Time Spent: 0.5h  (was: 20m)

> Add additional OWNERS
> -
>
> Key: BEAM-5920
> URL: https://issues.apache.org/jira/browse/BEAM-5920
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should spread knowledge and ownership of this new project. We have two 
> owners [currently 
> documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS].
>  I plan to monitor the new infrastructure closely during its initial rollout, 
> and then once it's stabilized add additional owners.
> I'd like to add additional owners in ~1 month



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171668
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:15
Start Date: 03/Dec/18 18:15
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238374414
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -20,119 +20,412 @@
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
 import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.avro.Conversions;
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema.Type;
 import org.apache.avro.generic.GenericEnumSymbol;
 import org.apache.avro.generic.GenericFixed;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.util.Utf8;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
 import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.joda.time.ReadableInstant;
 
 /** Utils to convert AVRO records to Beam rows. */
 @Experimental(Experimental.Kind.SCHEMAS)
 public class AvroUtils {
+  // Unwrap an AVRO schema into the base type an whether it is nullable.
+  static class TypeWithNullability {
+public final org.apache.avro.Schema type;
+public final boolean nullable;
+
+TypeWithNullability(org.apache.avro.Schema avroSchema) {
+  if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
+List types = avroSchema.getTypes();
+
+// optional fields in AVRO have form of:
+// {"name": "foo", "type": ["null", "something"]}
+
+// don't need recursion because nested unions aren't supported in AVRO
+List nonNullTypes =
+types
+.stream()
+.filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
+.collect(Collectors.toList());
+
+if (nonNullTypes.size() == types.size() || nonNullTypes.isEmpty()) {
+  // union without `null` or all 'null' union, keep as is.
+  type = avroSchema;
+  nullable = false;
+} else if (nonNullTypes.size() > 1) {
+  type = org.apache.avro.Schema.createUnion(nonNullTypes);
+  nullable = true;
+} else {
+  // One non-null type.
+  type = nonNullTypes.get(0);
+  nullable = true;
+}
+  } else {
+type = avroSchema;
+nullable = false;
+  }
+}
+  }
+
   private AvroUtils() {}
 
   /**
* Converts AVRO schema to Beam row schema.
*
* @param schema schema of type RECORD
*/
-  public static Schema toSchema(@Nonnull org.apache.avro.Schema schema) {
+  public static Schema toBeamSchema(org.apache.avro.Schema schema) {
 Schema.Builder builder = Schema.builder();
 
 for (org.apache.avro.Schema.Field field : schema.getFields()) {
-  org.apache.avro.Schema unwrapped = unwrapNullableSchema(field.schema());
-
-  if (!unwrapped.equals(field.schema())) {
-builder.addNullableField(field.name(), toFieldType(unwrapped));
-  } else {
-builder.addField(field.name(), toFieldType(unwrapped));
+  TypeWithNullability nullableType = new 
TypeWithNullability(field.schema());
+  Field beamField = Field.of(field.name(), toFieldType(nullableType));
+  if (field.doc() != null) {
+beamField = beamField.withDescription(field.doc());
   }
+  builder.addField(beamField);
 }
 
 return builder.build();
   }
 
-  /** Converts AVRO schema to Beam field. */
-  public static Schema.FieldType toFieldType(@Nonnull org.apache.avro.Schema 
avroSchema) {
-switch (avroSchema.getType()) {
-  case RECORD:
-return Schema.FieldType.row(toSchema(avroSchema));
+  /** Converts a Beam Schema into an AVRO schema. */
+  public static org.apache.avro.Schema toAvroSchema(Schema beamSchema) {
+List fields =

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171669
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:15
Start Date: 03/Dec/18 18:15
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238374164
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -20,119 +20,412 @@
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
 import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.avro.Conversions;
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema.Type;
 import org.apache.avro.generic.GenericEnumSymbol;
 import org.apache.avro.generic.GenericFixed;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.util.Utf8;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
 import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.joda.time.ReadableInstant;
 
 /** Utils to convert AVRO records to Beam rows. */
 @Experimental(Experimental.Kind.SCHEMAS)
 public class AvroUtils {
+  // Unwrap an AVRO schema into the base type an whether it is nullable.
+  static class TypeWithNullability {
+public final org.apache.avro.Schema type;
+public final boolean nullable;
+
+TypeWithNullability(org.apache.avro.Schema avroSchema) {
+  if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
+List types = avroSchema.getTypes();
+
+// optional fields in AVRO have form of:
+// {"name": "foo", "type": ["null", "something"]}
+
+// don't need recursion because nested unions aren't supported in AVRO
+List nonNullTypes =
+types
+.stream()
+.filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
+.collect(Collectors.toList());
+
+if (nonNullTypes.size() == types.size() || nonNullTypes.isEmpty()) {
+  // union without `null` or all 'null' union, keep as is.
+  type = avroSchema;
+  nullable = false;
+} else if (nonNullTypes.size() > 1) {
+  type = org.apache.avro.Schema.createUnion(nonNullTypes);
+  nullable = true;
+} else {
+  // One non-null type.
+  type = nonNullTypes.get(0);
+  nullable = true;
+}
+  } else {
+type = avroSchema;
+nullable = false;
+  }
+}
+  }
+
   private AvroUtils() {}
 
   /**
* Converts AVRO schema to Beam row schema.
*
* @param schema schema of type RECORD
*/
-  public static Schema toSchema(@Nonnull org.apache.avro.Schema schema) {
+  public static Schema toBeamSchema(org.apache.avro.Schema schema) {
 Schema.Builder builder = Schema.builder();
 
 for (org.apache.avro.Schema.Field field : schema.getFields()) {
-  org.apache.avro.Schema unwrapped = unwrapNullableSchema(field.schema());
-
-  if (!unwrapped.equals(field.schema())) {
-builder.addNullableField(field.name(), toFieldType(unwrapped));
-  } else {
-builder.addField(field.name(), toFieldType(unwrapped));
+  TypeWithNullability nullableType = new 
TypeWithNullability(field.schema());
+  Field beamField = Field.of(field.name(), toFieldType(nullableType));
+  if (field.doc() != null) {
+beamField = beamField.withDescription(field.doc());
   }
+  builder.addField(beamField);
 }
 
 return builder.build();
   }
 
-  /** Converts AVRO schema to Beam field. */
-  public static Schema.FieldType toFieldType(@Nonnull org.apache.avro.Schema 
avroSchema) {
-switch (avroSchema.getType()) {
-  case RECORD:
-return Schema.FieldType.row(toSchema(avroSchema));
+  /** Converts a Beam Schema into an AVRO schema. */
+  public static org.apache.avro.Schema toAvroSchema(Schema beamSchema) {
+List fields =

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171666
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 18:15
Start Date: 03/Dec/18 18:15
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7181: [BEAM-4454] Add more 
AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#issuecomment-443811288
 
 
   The code LGTM. 👍 There is remaining findbugs issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171666)
Time Spent: 5h 40m  (was: 5.5h)

> Provide automatic schema registration for AVROs
> ---
>
> Key: BEAM-4454
> URL: https://issues.apache.org/jira/browse/BEAM-4454
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Need to make sure this is a compatible change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6167) Create a Class to read content of a file kipping track of the file path (python)

2018-12-03 Thread Lorenzo Caggioni (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707588#comment-16707588
 ] 

Lorenzo Caggioni commented on BEAM-6167:


I created a commit for the change, you can find it on my fork:

 

[https://github.com/lcaggio/beam/commit/01daf88227ca844848a732a6cf97affcf0cd4818]

 

Happy to create a PR for this, if you do not have comment in the 
implementation, otherwise happy to iterate on the code and later submit the PR.

> Create a Class to read content of a file kipping track of the file path 
> (python)
> 
>
> Key: BEAM-6167
> URL: https://issues.apache.org/jira/browse/BEAM-6167
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.8.0
>Reporter: Lorenzo Caggioni
>Assignee: Eugene Kirpichov
>Priority: Minor
> Fix For: Not applicable
>
>
> Add a class to read content of a file keeping track of the file path each 
> element come from.
> This is an improvement of the current python/apache_beam/io/textio.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=171664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171664
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 03/Dec/18 17:43
Start Date: 03/Dec/18 17:43
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #7184: [BEAM-5985] Create 
jenkins jobs to run the load tests for Java SDK
URL: https://github.com/apache/beam/pull/7184#issuecomment-443800572
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171664)
Time Spent: 40m  (was: 0.5h)

> Create jenkins jobs to run the load tests for Java SDK
> --
>
> Key: BEAM-5985
> URL: https://issues.apache.org/jira/browse/BEAM-5985
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> How/how often/in what cases we run those tests is yet to be decided (this is 
> part of the task)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5920) Add additional OWNERS

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5920?focusedWorklogId=171659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171659
 ]

ASF GitHub Bot logged work on BEAM-5920:


Author: ASF GitHub Bot
Created on: 03/Dec/18 17:25
Start Date: 03/Dec/18 17:25
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7186: [BEAM-5920] Add 
additional owners for Community Metrics
URL: https://github.com/apache/beam/pull/7186#issuecomment-443794290
 
 
   R: @alanmyrvold @markflyhigh @yifanzou 
   CC: @Ardagan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171659)
Time Spent: 20m  (was: 10m)

> Add additional OWNERS
> -
>
> Key: BEAM-5920
> URL: https://issues.apache.org/jira/browse/BEAM-5920
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should spread knowledge and ownership of this new project. We have two 
> owners [currently 
> documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS].
>  I plan to monitor the new infrastructure closely during its initial rollout, 
> and then once it's stabilized add additional owners.
> I'd like to add additional owners in ~1 month



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6167) Create a Class to read content of a file kipping track of the file path (python)

2018-12-03 Thread Lorenzo Caggioni (JIRA)
Lorenzo Caggioni created BEAM-6167:
--

 Summary: Create a Class to read content of a file kipping track of 
the file path (python)
 Key: BEAM-6167
 URL: https://issues.apache.org/jira/browse/BEAM-6167
 Project: Beam
  Issue Type: Improvement
  Components: io-ideas
Affects Versions: 2.8.0
Reporter: Lorenzo Caggioni
Assignee: Eugene Kirpichov
 Fix For: Not applicable


Add a class to read content of a file keeping track of the file path each 
element come from.

This is an improvement of the current python/apache_beam/io/textio.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6157) SplittableParDoViaKeyedWorkItems sets another timer when receiving ProcessContinuation#STOP

2018-12-03 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707568#comment-16707568
 ] 

Luke Cwik commented on BEAM-6157:
-

Not blocking, just good to understand since there is likely a bug here that is 
being worked around.

> SplittableParDoViaKeyedWorkItems sets another timer when receiving 
> ProcessContinuation#STOP
> ---
>
> Key: BEAM-6157
> URL: https://issues.apache.org/jira/browse/BEAM-6157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-direct
>Reporter: Luke Cwik
>Assignee: Scott Wegner
>Priority: Major
>
> [HBaseIOTest#testReadingKeyRangeMiddleSDF|https://github.com/apache/beam/blob/b06b8e5df5738e7dc3620f67134da67a4a806758/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java#L326]
>  consistently fails if SplittableParDoViaKeyedWorkItems sets the wake-up time 
> to be the resume time when receiving a ProcessContinuation#STOP
> Why are we setting another timer to execute after STOP?
> Also, why is the direct runner failing to make progress with the timer if it 
> is set in the past relative to the current process time?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707563#comment-16707563
 ] 

Maximilian Michels commented on BEAM-1628:
--

Oh sorry, I just had a look at this and fixed it while I was at it. Could you 
help reviewing the PR? It is here: https://github.com/apache/beam/pull/7187

We will find something else for a code contribution! 

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1628?focusedWorklogId=171660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171660
 ]

ASF GitHub Bot logged work on BEAM-1628:


Author: ASF GitHub Bot
Created on: 03/Dec/18 17:27
Start Date: 03/Dec/18 17:27
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #7187: [BEAM-1628] Allow 
empty port for flink master url
URL: https://github.com/apache/beam/pull/7187
 
 
   This allows only specifying the host name without a port. It reads the port 
from
   the configuration or uses the default Flink port if non configured. It does 
not
   fall back to the special "[auto]" mode anymore if the port is omitted.
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171660)
Time Spent: 10m
Remaining Estimate: 0h

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The logic for handling {{--flinkMaster}} seems not particular

[jira] [Work logged] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1628?focusedWorklogId=171663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171663
 ]

ASF GitHub Bot logged work on BEAM-1628:


Author: ASF GitHub Bot
Created on: 03/Dec/18 17:27
Start Date: 03/Dec/18 17:27
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7187: [BEAM-1628] Allow empty 
port for flink master url
URL: https://github.com/apache/beam/pull/7187#issuecomment-443795123
 
 
   CC @tweise @angoenka 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171663)
Time Spent: 20m  (was: 10m)

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5920) Add additional OWNERS

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5920?focusedWorklogId=171658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171658
 ]

ASF GitHub Bot logged work on BEAM-5920:


Author: ASF GitHub Bot
Created on: 03/Dec/18 17:25
Start Date: 03/Dec/18 17:25
Worklog Time Spent: 10m 
  Work Description: swegner opened a new pull request #7186: [BEAM-5920] 
Add additional owners for Community Metrics
URL: https://github.com/apache/beam/pull/7186
 
 
   We should spread knowledge and ownership of this new project. We have two 
owners currently documented, and a few more have volunteered. The new 
infrastructure has been live for over a month with no issues so far.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171658)
Time Spent: 10m

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171650
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 16:57
Start Date: 03/Dec/18 16:57
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238351907
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -150,9 +439,20 @@ public static Object convertAvroFieldStrict(
   @Nonnull org.apache.avro.Schema avroSchema,
   @Nonnull Schema.FieldType fieldType) {
 
-org.apache.avro.Schema unwrapped = unwrapNullableSchema(avroSchema);
+TypeWithNullability type = new TypeWithNullability(avroSchema);
+LogicalType logicalType = LogicalTypes.fromSchema(type.type);
+if (logicalType != null) {
+  if (logicalType instanceof LogicalTypes.Decimal) {
+BigDecimal bigDecimal =
+new Conversions.DecimalConversion()
+.fromBytes((ByteBuffer) value, type.type, logicalType);
 
 Review comment:
   good catch here!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171650)
Time Spent: 5h 20m  (was: 5h 10m)

> Provide automatic schema registration for AVROs
> ---
>
> Key: BEAM-4454
> URL: https://issues.apache.org/jira/browse/BEAM-4454
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Need to make sure this is a compatible change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171651
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 16:57
Start Date: 03/Dec/18 16:57
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238351938
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -20,119 +20,408 @@
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
 import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.avro.Conversions;
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema.Type;
 import org.apache.avro.generic.GenericEnumSymbol;
 import org.apache.avro.generic.GenericFixed;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
 import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.joda.time.ReadableInstant;
 
 /** Utils to convert AVRO records to Beam rows. */
 @Experimental(Experimental.Kind.SCHEMAS)
 public class AvroUtils {
+  // Unwrap an AVRO schema into the base type an whether it is nullable.
+  static class TypeWithNullability {
+public final org.apache.avro.Schema type;
+public final boolean nullable;
+
+TypeWithNullability(org.apache.avro.Schema avroSchema) {
+  if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
+List types = avroSchema.getTypes();
+
+// optional fields in AVRO have form of:
+// {"name": "foo", "type": ["null", "something"]}
+
+// don't need recursion because nested unions aren't supported in AVRO
+List nonNullTypes =
+types
+.stream()
+.filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
+.collect(Collectors.toList());
+
+if (nonNullTypes.size() == types.size() || nonNullTypes.isEmpty()) {
+  // union without `null` or all 'null' union, keep as is.
+  type = avroSchema;
+  nullable = false;
+} else if (nonNullTypes.size() > 1) {
+  type = org.apache.avro.Schema.createUnion(nonNullTypes);
+  nullable = true;
+} else {
+  // One non-null type.
+  type = nonNullTypes.get(0);
+  nullable = true;
+}
+  } else {
+type = avroSchema;
+nullable = false;
+  }
+}
+  }
+
   private AvroUtils() {}
 
   /**
* Converts AVRO schema to Beam row schema.
*
* @param schema schema of type RECORD
*/
-  public static Schema toSchema(@Nonnull org.apache.avro.Schema schema) {
+  public static Schema toBeamSchema(org.apache.avro.Schema schema) {
 Schema.Builder builder = Schema.builder();
 
 for (org.apache.avro.Schema.Field field : schema.getFields()) {
-  org.apache.avro.Schema unwrapped = unwrapNullableSchema(field.schema());
-
-  if (!unwrapped.equals(field.schema())) {
-builder.addNullableField(field.name(), toFieldType(unwrapped));
-  } else {
-builder.addField(field.name(), toFieldType(unwrapped));
+  TypeWithNullability nullableType = new 
TypeWithNullability(field.schema());
+  Field beamField = Field.of(field.name(), toFieldType(nullableType));
+  if (field.doc() != null) {
+beamField = beamField.withDescription(field.doc());
   }
+  builder.addField(beamField);
 }
 
 return builder.build();
   }
 
-  /** Converts AVRO schema to Beam field. */
-  public static Schema.FieldType toFieldType(@Nonnull org.apache.avro.Schema 
avroSchema) {
-switch (avroSchema.getType()) {
-  case RECORD:
-return Schema.FieldType.row(toSchema(avroSchema));
+  /** Converts a Beam Schema into an AVRO schema. */
+  public static org.apache.avro.Schema toAvroSchema(Schema beamSchema) {
+List fields = Lists.newArrayList();
+for 

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=171649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171649
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 03/Dec/18 16:55
Start Date: 03/Dec/18 16:55
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #7181: 
[BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
URL: https://github.com/apache/beam/pull/7181#discussion_r238351223
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -198,50 +498,20 @@ public static Object convertAvroFieldStrict(
 throw new IllegalArgumentException("Can't convert 'null' to 
non-nullable field");
 
   default:
-throw new AssertionError("Unexpected AVRO Schema.Type: " + 
unwrapped.getType());
+throw new AssertionError("Unexpected AVRO Schema.Type: " + 
type.type.getType());
 }
   }
 
-  @VisibleForTesting
-  static org.apache.avro.Schema unwrapNullableSchema(org.apache.avro.Schema 
avroSchema) {
-if (avroSchema.getType() == org.apache.avro.Schema.Type.UNION) {
-  List types = avroSchema.getTypes();
-
-  // optional fields in AVRO have form of:
-  // {"name": "foo", "type": ["null", "something"]}
-
-  // don't need recursion because nested unions aren't supported in AVRO
-  List nonNullTypes =
-  types
-  .stream()
-  .filter(x -> x.getType() != org.apache.avro.Schema.Type.NULL)
-  .collect(Collectors.toList());
-
-  if (nonNullTypes.size() == types.size()) {
-// union without `null`, keep as is
-return avroSchema;
-  } else if (nonNullTypes.size() > 1) {
-return org.apache.avro.Schema.createUnion(nonNullTypes);
-  } else if (nonNullTypes.size() == 1) {
-return nonNullTypes.get(0);
-  } else { // nonNullTypes.size() == 0
-return avroSchema;
-  }
-}
-
-return avroSchema;
-  }
-
   private static Object convertRecordStrict(GenericRecord record, 
Schema.FieldType fieldType) {
 checkTypeName(fieldType.getTypeName(), Schema.TypeName.ROW, "record");
-return toRowStrict(record, fieldType.getRowSchema());
+return toBeamRowStrict(record, fieldType.getRowSchema());
   }
 
   private static Object convertBytesStrict(ByteBuffer bb, Schema.FieldType 
fieldType) {
 checkTypeName(fieldType.getTypeName(), Schema.TypeName.BYTES, "bytes");
 
 byte[] bytes = new byte[bb.remaining()];
-bb.get(bytes);
+bb.duplicate().get(bytes);
 
 Review comment:
   Probably not causing issues in BigQueryAvroUtils as I don't think those Avro 
objects are ever reused. However maybe we should remove the code in 
BigQueryAvroUtils and just standardize on this library.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171649)
Time Spent: 5h 10m  (was: 5h)

> Provide automatic schema registration for AVROs
> ---
>
> Key: BEAM-4454
> URL: https://issues.apache.org/jira/browse/BEAM-4454
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Need to make sure this is a compatible change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=171645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171645
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 03/Dec/18 16:44
Start Date: 03/Dec/18 16:44
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #7184: BEAM-5985 Create 
jenkins jobs to run the load tests for Java SDK
URL: https://github.com/apache/beam/pull/7184#issuecomment-443779124
 
 
   Run GroupByKey Java Load Test Direct


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171645)
Time Spent: 0.5h  (was: 20m)

> Create jenkins jobs to run the load tests for Java SDK
> --
>
> Key: BEAM-5985
> URL: https://issues.apache.org/jira/browse/BEAM-5985
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> How/how often/in what cases we run those tests is yet to be decided (this is 
> part of the task)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread Evgeniy Musin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707505#comment-16707505
 ] 

Evgeniy Musin commented on BEAM-1628:
-

[~mxm], oh, actually i really want to try implement it, if you allow me.

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5984) Publish metrics from load tests to BigQuery database

2018-12-03 Thread Lukasz Gajowy (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-5984.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Publish metrics from load tests to BigQuery database
> 
>
> Key: BEAM-5984
> URL: https://issues.apache.org/jira/browse/BEAM-5984
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Collect metrics in BQ so that those could be read and plotted using 
> PerfKitExplorer tool or any other similar. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6100) Runtime and total bytes metrics are not collected properly

2018-12-03 Thread Lukasz Gajowy (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-6100.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Runtime and total bytes metrics are not collected properly
> --
>
> Key: BEAM-6100
> URL: https://issues.apache.org/jira/browse/BEAM-6100
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently, we collect time (distribution) and bytes (counter) metrics from 
> one ParDo (called MetricsMonitor) that is put in pipelines in one, 
> arbitrarily chosen place (usually "in the middle" of pipeline's graph. In 
> some cases, invalid time (or total bytes count) is registered. 
> Taking [this|https://github.com/apache/beam/pull/6987#discussion_r231976671] 
> discussion into account, ideally, we'd like to:
>  - collect runtime by recording time at the root and sink(s) of the pipeline
>  - collect total bytes in a separate ParDo that allows deciding what byte 
> amount do we actually want to collect (now it's coupled to the 
> time-collecting Monitor which is inconvenient).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2018-12-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=171642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-171642
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 03/Dec/18 16:29
Start Date: 03/Dec/18 16:29
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #7184: BEAM-5985 Create 
jenkins jobs to run the load tests for Java SDK
URL: https://github.com/apache/beam/pull/7184#issuecomment-443773179
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 171642)
Time Spent: 20m  (was: 10m)

> Create jenkins jobs to run the load tests for Java SDK
> --
>
> Key: BEAM-5985
> URL: https://issues.apache.org/jira/browse/BEAM-5985
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> How/how often/in what cases we run those tests is yet to be decided (this is 
> part of the task)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707493#comment-16707493
 ] 

Maximilian Michels commented on BEAM-1628:
--

[~Evgeniy_M] Taking this unless you are eager to fix. 

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-1628:


Assignee: Maximilian Michels

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: newbie, starter
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1628) Flink runner: logic around --flinkMaster is error-prone

2018-12-03 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707485#comment-16707485
 ] 

Maximilian Michels commented on BEAM-1628:
--

Doesn't appear to be fixed. Code looks much like what is provided in the 
description above.

> Flink runner: logic around --flinkMaster is error-prone
> ---
>
> Key: BEAM-1628
> URL: https://issues.apache.org/jira/browse/BEAM-1628
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Priority: Minor
>  Labels: newbie, starter
>
> The logic for handling {{--flinkMaster}} seems not particularly user-friendly.
> https://github.com/apache/beam/blob/fbcde4cdc7d68de8734bf540c079b2747631a854/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java#L132
> {code}
> if (masterUrl.equals("[local]")) {
> } else if (masterUrl.equals("[collection]")) {
> } else if (masterUrl.equals("[auto]")) {
> } else if (masterUrl.matches(".*:\\d*")) {
> } else {
>   // use auto.
> }
> {code}
> The options are constructed with "auto" set as default.
> I think we should do the following:
> * I assume there's a default port for the Flink master. We should default to 
> it.
> * We should treat a string without a colon as a host name. (Not default to 
> local execution.)
> This is super easy fix, hopefully someone can pick it up quickly ;-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >