[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267298
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:39
Start Date: 26/Jun/19 06:39
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #8895: [BEAM-7586] Add 
Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297504070
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
+  default_coll = 'integration_test_%d' % time.time()
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mongo_uri',
+  default='mongodb://localhost:27017',
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_db',
+  default=default_db,
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_coll',
+  default=default_coll,
+  help='mongo uri string for connection')
+  parser.add_argument('--num_documents',
+  default=1000,
+  help='The expected number of documents to be generated '
+  'for write or read',
+  type=int)
+  parser.add_argument('--batch_size',
+  default=100,
+  help=('batch size for writing to mongodb'))
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Test Write to MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+logging.info('Writing %d documents to mongodb' % known_args.num_documents)
+docs = [{'test_document_n': x} for x in range(known_args.num_documents)]
+
+start_time = time.time()
+_ = p | 'Create documents' >> beam.Create(docs) \
+  | 'WriteToMongoDB' >> beam.io.WriteToMongoDB(known_args.mongo_uri,
+   known_args.mongo_db,
+   known_args.mongo_coll,
+   known_args.batch_size)
+logging.info('Writing %d documents to mongodb finished in %.3f seconds' %
+ (known_args.num_documents, time.time() - start_time))
+
+  # Test Read from MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+start_time = time.time()
+logging.info('Reading from mongodb %s:%s' %
+ (known_args.mongo_db, known_args.mongo_coll))
+r = p | 'ReadFromMongoDB' >> beam.io.ReadFromMongoDB(known_args.mongo_uri,
+ known_args.mongo_db,
+ 
known_args.mongo_coll)\
+  | 'Count' >> beam.combiners.Count.Globally()
 
 Review comment:
   will do.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267298)
Time Spent: 2.5h  (was: 2h 20m)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core

[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267299
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:39
Start Date: 26/Jun/19 06:39
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #8895: [BEAM-7586] Add 
Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297504116
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
+  default_coll = 'integration_test_%d' % time.time()
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mongo_uri',
+  default='mongodb://localhost:27017',
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_db',
+  default=default_db,
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_coll',
+  default=default_coll,
+  help='mongo uri string for connection')
+  parser.add_argument('--num_documents',
+  default=1000,
+  help='The expected number of documents to be generated '
+  'for write or read',
+  type=int)
+  parser.add_argument('--batch_size',
+  default=100,
+  help=('batch size for writing to mongodb'))
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Test Write to MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+logging.info('Writing %d documents to mongodb' % known_args.num_documents)
+docs = [{'test_document_n': x} for x in range(known_args.num_documents)]
 
 Review comment:
   make sense will do.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267299)
Time Spent: 2h 40m  (was: 2.5h)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267297&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267297
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:37
Start Date: 26/Jun/19 06:37
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #8895: [BEAM-7586] Add 
Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297503770
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_it_test.py
 ##
 @@ -0,0 +1,54 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import logging
+import unittest
+
+from hamcrest import all_of
+from nose.plugins.attrib import attr
+from pymongo import MongoClient
+
+from apache_beam.io import mongodbio_pipeline
+from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class MongoDBIOIntegrationTest(unittest.TestCase):
+  @attr('IT')
+  def test_mongodbio_read_write(self):
+default_args = {
+'mongo_uri': 'mongodb://localhost:27017',
+'mongo_db': 'beam_mongodbio_it_test',
+}
+test_pipeline = TestPipeline(is_integration_test=True)
+pipeline_verifiers = [
+PipelineStateMatcher(),
+]
+mongodbio_pipeline.run(
+test_pipeline.get_full_options_as_args(
+on_success_matcher=all_of(*pipeline_verifiers), **default_args))
+
+# clean up
+with MongoClient(default_args['mongo_db']) as client:
+  client.drop_database(default_args['mongo_db'])
 
 Review comment:
   I can see java sdk uses k8s to setup the mongodb and there's a separate 
jenkins job triggers it.
   python's integration tests seem to run all together? 
   
   Is there a recommended way to start a container for testing purpose?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267297)
Time Spent: 2h 20m  (was: 2h 10m)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267295
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:26
Start Date: 26/Jun/19 06:26
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505737779
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267295)
Time Spent: 7h 20m  (was: 7h 10m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267288
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505449740
 
 
   Run Chicago Taxi on Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267288)
Time Spent: 6.5h  (was: 6h 20m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267290&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267290
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505453355
 
 
   Run Chicago Taxi on Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267290)
Time Spent: 6h 50m  (was: 6h 40m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267291
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505427010
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267291)
Time Spent: 7h  (was: 6h 50m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267287&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267287
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505735831
 
 
   Run Chicago Taxi on Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267287)
Time Spent: 6h 20m  (was: 6h 10m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267292
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505440299
 
 
   Run Chicago Taxi on Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267292)
Time Spent: 7h 10m  (was: 7h)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267289
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:18
Start Date: 26/Jun/19 06:18
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505450614
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267289)
Time Spent: 6h 40m  (was: 6.5h)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=267286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267286
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 26/Jun/19 06:05
Start Date: 26/Jun/19 06:05
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-505732372
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267286)
Time Spent: 6h 10m  (was: 6h)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7606) Fix JDBC time conversion tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7606?focusedWorklogId=267262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267262
 ]

ASF GitHub Bot logged work on BEAM-7606:


Author: ASF GitHub Bot
Created on: 26/Jun/19 04:51
Start Date: 26/Jun/19 04:51
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #8938: [release-2.14.0] 
[BEAM-7606] Fix JDBC time conversion tests
URL: https://github.com/apache/beam/pull/8938#issuecomment-505717270
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267262)
Time Spent: 2h 10m  (was: 2h)

> Fix JDBC time conversion tests
> --
>
> Key: BEAM-7606
> URL: https://issues.apache.org/jira/browse/BEAM-7606
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Charith Ellawala
>Priority: Critical
> Fix For: 2.14.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It seems that after merging the support for Beam Rows in JdbcIO (BEAM-6674) 
> the _Test is broken, can somebody please take a look.
> CC: [~reuvenlax]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267234
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8895: [BEAM-7586] Add 
Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#issuecomment-505697959
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267234)
Time Spent: 2h 10m  (was: 2h)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267233
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297467145
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
+  default_coll = 'integration_test_%d' % time.time()
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mongo_uri',
+  default='mongodb://localhost:27017',
 
 Review comment:
   What if this port is not available when running the test ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267233)
Time Spent: 2h  (was: 1h 50m)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267231
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297466898
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
+  default_coll = 'integration_test_%d' % time.time()
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mongo_uri',
+  default='mongodb://localhost:27017',
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_db',
+  default=default_db,
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_coll',
+  default=default_coll,
+  help='mongo uri string for connection')
+  parser.add_argument('--num_documents',
+  default=1000,
+  help='The expected number of documents to be generated '
+  'for write or read',
+  type=int)
+  parser.add_argument('--batch_size',
+  default=100,
+  help=('batch size for writing to mongodb'))
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Test Write to MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+logging.info('Writing %d documents to mongodb' % known_args.num_documents)
+docs = [{'test_document_n': x} for x in range(known_args.num_documents)]
 
 Review comment:
   I'm confused by this statement, is x here the content of the document ? Can 
we add a more representative content that we can later use to query/verify ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267231)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267229
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297466044
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_it_test.py
 ##
 @@ -0,0 +1,54 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import logging
+import unittest
+
+from hamcrest import all_of
+from nose.plugins.attrib import attr
+from pymongo import MongoClient
+
+from apache_beam.io import mongodbio_pipeline
+from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class MongoDBIOIntegrationTest(unittest.TestCase):
+  @attr('IT')
+  def test_mongodbio_read_write(self):
+default_args = {
+'mongo_uri': 'mongodb://localhost:27017',
+'mongo_db': 'beam_mongodbio_it_test',
+}
+test_pipeline = TestPipeline(is_integration_test=True)
+pipeline_verifiers = [
+PipelineStateMatcher(),
+]
+mongodbio_pipeline.run(
+test_pipeline.get_full_options_as_args(
+on_success_matcher=all_of(*pipeline_verifiers), **default_args))
+
+# clean up
+with MongoClient(default_args['mongo_db']) as client:
+  client.drop_database(default_args['mongo_db'])
 
 Review comment:
   Where do we setup the database and startup the MongoDB instance and the 
database ? Should there be a shell script that starts up a MongoDB instance and 
then runs the pipelines against it ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267229)
Time Spent: 1h 40m  (was: 1.5h)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267230
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297466816
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
+  default_coll = 'integration_test_%d' % time.time()
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mongo_uri',
+  default='mongodb://localhost:27017',
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_db',
+  default=default_db,
+  help='mongo uri string for connection')
+  parser.add_argument('--mongo_coll',
+  default=default_coll,
+  help='mongo uri string for connection')
+  parser.add_argument('--num_documents',
+  default=1000,
+  help='The expected number of documents to be generated '
+  'for write or read',
+  type=int)
+  parser.add_argument('--batch_size',
+  default=100,
+  help=('batch size for writing to mongodb'))
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Test Write to MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+logging.info('Writing %d documents to mongodb' % known_args.num_documents)
+docs = [{'test_document_n': x} for x in range(known_args.num_documents)]
+
+start_time = time.time()
+_ = p | 'Create documents' >> beam.Create(docs) \
+  | 'WriteToMongoDB' >> beam.io.WriteToMongoDB(known_args.mongo_uri,
+   known_args.mongo_db,
+   known_args.mongo_coll,
+   known_args.batch_size)
+logging.info('Writing %d documents to mongodb finished in %.3f seconds' %
+ (known_args.num_documents, time.time() - start_time))
+
+  # Test Read from MongoDB
+  with TestPipeline(options=PipelineOptions(pipeline_args)) as p:
+start_time = time.time()
+logging.info('Reading from mongodb %s:%s' %
+ (known_args.mongo_db, known_args.mongo_coll))
+r = p | 'ReadFromMongoDB' >> beam.io.ReadFromMongoDB(known_args.mongo_uri,
+ known_args.mongo_db,
+ 
known_args.mongo_coll)\
+  | 'Count' >> beam.combiners.Count.Globally()
 
 Review comment:
   Can we verify the contents of the documents instead of just the count ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267230)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
> 

[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=267232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267232
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 26/Jun/19 03:00
Start Date: 26/Jun/19 03:00
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#discussion_r297467106
 
 

 ##
 File path: sdks/python/apache_beam/io/mongodbio_pipeline.py
 ##
 @@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+def run(argv=None):
+  default_db = 'beam_mongodbio_it_db'
 
 Review comment:
   What if multiple instances of this test run concurrently ? Probably we 
should generate a database ID that is unique to this test instance.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267232)
Time Spent: 1h 50m  (was: 1h 40m)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=267212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267212
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 26/Jun/19 01:49
Start Date: 26/Jun/19 01:49
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#discussion_r297456280
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py
 ##
 @@ -165,6 +169,33 @@ def describe_mismatch(self, pipeline_result, 
mismatch_description):
   .append_text(self.actual_data)
 
 
+class BigqueryFullResultStreamingMatcher(BigqueryFullResultMatcher):
+  """
+  Matcher that verifies Bigquery data with given query.
+
+  Fetch Bigquery data with given query, compare to the expected data.
+  This matcher polls BigQuery until the no. of records in BigQuery is
+  equal to the no. of records in expected data.
+  A timeout can be specified.
+  """
+
+  def __init__(self, project, query, data, timeout=MAX_TIMESTAMP):
 
 Review comment:
   Could you make the default timeout to be finite. Would 5/10/15 minutes work? 
(the less the better - having post-commit tests hang until their 2 hour 
deadline is not fun. It easier to debug where the test timed out if it raises a 
TimeoutError and the stacktrace is printed)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267212)
Time Spent: 3h  (was: 2h 50m)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7632) Update Python quickstart guide for Flink and Spark

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7632?focusedWorklogId=267197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267197
 ]

ASF GitHub Bot logged work on BEAM-7632:


Author: ASF GitHub Bot
Created on: 26/Jun/19 01:22
Start Date: 26/Jun/19 01:22
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8949: [BEAM-7632] 
Update Python quickstart guide for Flink and Spark
URL: https://github.com/apache/beam/pull/8949#discussion_r297452381
 
 

 ##
 File path: website/src/roadmap/portability.md
 ##
 @@ -144,25 +144,34 @@ their respective components.
 
 MVP, and FeatureCompletness nearly done (missing SDF, timers) for
 SDKs, Python ULR, and shared java runners library.
-Flink is the first runner to fully leverage this, with focus moving to
-Performance.
+Currently, the Flink and Spark runners support portable pipeline execution.
 See the
 [Portability support 
table](https://s.apache.org/apache-beam-portability-support-table)
 for details.
 
-### Running Python wordcount on Flink or Spark {#python-on-flink}
+### Running Python wordcount on Flink {#python-on-flink}
 
-Currently, the Flink and Spark runners support portable pipeline execution.
-To run a basic Python wordcount (in batch mode) with embedded Flink or Spark:
+To run a basic Python wordcount (in batch mode) with embedded Flink:
 
 1. Run once to build the SDK harness container: `./gradlew 
:sdks:python:container:docker`
-2. Choose one:
- * Start the Flink portable JobService endpoint: `./gradlew 
:runners:flink:1.5:job-server:runShadow`
- * Or start the Spark portable JobService endpoint: `./gradlew 
:runners:spark:job-server:runShadow`
-3. Submit the wordcount pipeline to above endpoint: `./gradlew 
:sdks:python:portableWordCount -PjobEndpoint=localhost:8099 
-PenvironmentType=LOOPBACK`
+2. Start the Flink portable JobService endpoint: `./gradlew 
:runners:flink:1.5:job-server:runShadow`
+3. In a new terminal, submit the wordcount pipeline to above endpoint: 
`./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 
-PenvironmentType=LOOPBACK`
 
-To run the pipeline in streaming mode (currently only supported on Flink): 
`./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 
-Pstreaming`
+To run the pipeline in streaming mode: `./gradlew 
:sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -Pstreaming`
 
 Please see the [Flink Runner page]({{ site.baseurl 
}}/documentation/runners/flink/) for more information on
 how to run portable pipelines on top of Flink.
 
+### Running Python wordcount on Spark {#python-on-spark}
+
+To run a basic Python wordcount (in batch mode) with embedded Spark:
+
+1. Run once to build the SDK harness container: `./gradlew 
:sdks:python:container:docker`
+2. Start the Spark portable JobService endpoint: `./gradlew 
:runners:spark:job-server:runShadow`
+3. In a new terminal, submit the wordcount pipeline to above endpoint: 
`./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 
-PenvironmentType=LOOPBACK`
+
+Python streaming mode is not yet supported on the Spark.
 
 Review comment:
   perhaps either remove "the", or add "runner" to the end so it is Spark runner
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267197)
Time Spent: 0.5h  (was: 20m)

> Update Python quickstart guide for Flink and Spark
> --
>
> Key: BEAM-7632
> URL: https://issues.apache.org/jira/browse/BEAM-7632
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, the documentation says "This runner is not yet available for the 
> Python SDK.", which is out of date. 
> [https://beam.apache.org/get-started/quickstart-py/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7632) Update Python quickstart guide for Flink and Spark

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7632?focusedWorklogId=267196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267196
 ]

ASF GitHub Bot logged work on BEAM-7632:


Author: ASF GitHub Bot
Created on: 26/Jun/19 01:22
Start Date: 26/Jun/19 01:22
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8949: [BEAM-7632] 
Update Python quickstart guide for Flink and Spark
URL: https://github.com/apache/beam/pull/8949#discussion_r297452227
 
 

 ##
 File path: website/src/get-started/quickstart-py.md
 ##
 @@ -176,17 +176,20 @@ This runner is not yet available for the Python SDK.
 
 {:.runner-flink-local}
 ```
-This runner is not yet available for the Python SDK.
+Currently, running wordcount.py on Flink requires a full download of the Beam 
source code.
+See https://beam.apache.org/roadmap/portability/#python-on-flink for more.
 
 Review comment:
   for all of these, I would suggest changing "for more." to  "for more 
information."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267196)
Time Spent: 20m  (was: 10m)

> Update Python quickstart guide for Flink and Spark
> --
>
> Key: BEAM-7632
> URL: https://issues.apache.org/jira/browse/BEAM-7632
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the documentation says "This runner is not yet available for the 
> Python SDK.", which is out of date. 
> [https://beam.apache.org/get-started/quickstart-py/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6669) revert service_default_cmek_config experiment flag

2019-06-25 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-6669.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> revert service_default_cmek_config experiment flag 
> ---
>
> Key: BEAM-6669
> URL: https://issues.apache.org/jira/browse/BEAM-6669
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Do this when --dataflowKmsKey is supported on Dataflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7632) Update Python quickstart guide for Flink and Spark

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7632?focusedWorklogId=267144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267144
 ]

ASF GitHub Bot logged work on BEAM-7632:


Author: ASF GitHub Bot
Created on: 26/Jun/19 00:29
Start Date: 26/Jun/19 00:29
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8949: [BEAM-7632] 
Update Python quickstart guide for Flink and Spark
URL: https://github.com/apache/beam/pull/8949
 
 
   Also split the portability guide into separate Flink and Spark sections to 
reduce potential confusion.
   
   R: @aaltay @markflyhigh 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-25 Thread Heejong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872813#comment-16872813
 ] 

Heejong Lee commented on BEAM-7424:
---

Confirming that 429 handler is already there in Python api_tools 
([https://github.com/google/apitools/blob/master/apitools/base/py/http_wrapper.py#L297]).
 Maybe it's just okay or we can increase the default retrying threshold since 
it looks a bit small for quota errors.

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7632) Update Python quickstart guide for Flink and Spark

2019-06-25 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7632:
-

 Summary: Update Python quickstart guide for Flink and Spark
 Key: BEAM-7632
 URL: https://issues.apache.org/jira/browse/BEAM-7632
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Currently, the documentation says "This runner is not yet available for the 
Python SDK.", which is out of date. 
[https://beam.apache.org/get-started/quickstart-py/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=267123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267123
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 25/Jun/19 23:35
Start Date: 25/Jun/19 23:35
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8947: [BEAM-7013] Add 
HLL doc link to Beam website
URL: https://github.com/apache/beam/pull/8947
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267123)
Time Spent: 20m  (was: 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-25 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7548 started by Hannah Jiang.
--
> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = self.worker.do_instruction(request)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 11:57:47 request.instruction_id)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 11:57:47 bundle_processor.process_bundle(instruction_id))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 595, in process_bundle
> 11:57:47 data.ptransform_id].p

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=267105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267105
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 25/Jun/19 23:08
Start Date: 25/Jun/19 23:08
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8948: 
[BEAM-7548] fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8948
 
 
   Fix flaky tests for ApproximateUnique by adding retries and with fixed 
inputs instead of generating random inputs every time.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https:/

[jira] [Work logged] (BEAM-7603) Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7603?focusedWorklogId=267103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267103
 ]

ASF GitHub Bot logged work on BEAM-7603:


Author: ASF GitHub Bot
Created on: 25/Jun/19 23:07
Start Date: 25/Jun/19 23:07
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8908: [BEAM-7603] 
Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
URL: https://github.com/apache/beam/pull/8908#issuecomment-505655001
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267103)
Time Spent: 3h  (was: 2h 50m)

> Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
> -
>
> Key: BEAM-7603
> URL: https://issues.apache.org/jira/browse/BEAM-7603
> Project: Beam
>  Issue Type: Improvement
>  Components: io-python-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=267070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267070
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 25/Jun/19 22:00
Start Date: 25/Jun/19 22:00
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8872: 
[BEAM-3645] add ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#discussion_r297411666
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1343,6 +1381,7 @@ def __init__(
 self._controller = controller
 self._get_buffer = get_buffer
 self._get_input_coder_impl = get_input_coder_impl
+self._num_workers = num_workers
 
 Review comment:
   This is used at L:1540. Default num_workers is got from self._num_workers, 
and it can be overwritten if num_worker is passed to `process_bundle`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267070)
Time Spent: 12h 10m  (was: 12h)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=267068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267068
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 25/Jun/19 21:52
Start Date: 25/Jun/19 21:52
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8872: 
[BEAM-3645] add ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#discussion_r297409159
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -139,6 +139,18 @@ def done(self):
   self._state = self.DONE_STATE
 
 
+class _ListBuffer(list):
+  """Used to support parititioning of a list."""
+  def partition(self, n):
+n = min(n, len(self))
 
 Review comment:
   I think we still need it, because if we return n groups with some empty 
groups, we will pass additional {target:[]} to `process_bundle` which is not 
expected. Some tests are hanging without this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267068)
Time Spent: 12h  (was: 11h 50m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=267063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267063
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 25/Jun/19 21:48
Start Date: 25/Jun/19 21:48
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8872: 
[BEAM-3645] add ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#discussion_r297407962
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1506,6 +1545,52 @@ def process_bundle(self, inputs, expected_outputs):
 return result, split_results
 
 
+class ParallelBundleManager(BundleManager):
+
+  def _check_inputs_split(self, expected_outputs):
+# We skip splitting inputs when timer is set, because operations are not
+# triggered until we sent inputs for timers.
+for _, pcoll_id in expected_outputs.items():
+  kind = split_buffer_id(pcoll_id)[0]
+  if kind in ['timers']:
+return False
+
+return True
+
+  def process_bundle(self, inputs, expected_outputs, num_workers=None):
+num_workers = num_workers or self._num_workers
+param_list = []
+
+if self._check_inputs_split(expected_outputs):
+  for name, input in inputs.items():
+for part in input.partition(num_workers):
+  param_list.append(({name : part}, expected_outputs))
 
 Review comment:
   This worked fantastically!!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267063)
Time Spent: 11h 50m  (was: 11h 40m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-25 Thread Heejong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872731#comment-16872731
 ] 

Heejong Lee commented on BEAM-7424:
---

update: succeeded in getting 429 errors from Python SDK. Will try to fix and 
submit PR today.

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7284) Support Py3 Dataclasses

2019-06-25 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872717#comment-16872717
 ] 

Valentyn Tymofieiev commented on BEAM-7284:
---

[~yoshiki.obata], This could definitely be a fix, thanks a lot for exploring. I 
don't yet fully understand what else may be impacted by this change, and 
whether there may be unintended side-effects. It would be great if you could 
clarify it here and/or in the PR description.

Also, it would be great to to add a unit test for Dataclasses to see that they 
are indeed supported. Since Dataclasses will be considered invalid syntax on 
Python 2, we may have to temporary wrap incompatible code with eval() 
statements, as was done in 
https://github.com/apache/beam/pull/8893#discussion_r295534911.

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=267036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267036
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 25/Jun/19 21:03
Start Date: 25/Jun/19 21:03
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #8947: [BEAM-7013] 
Add HLL doc link to Beam website
URL: https://github.com/apache/beam/pull/8947
 
 
   r: @aaltay @iemejia
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | 

[jira] [Commented] (BEAM-4775) JobService should support returning metrics

2019-06-25 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872705#comment-16872705
 ] 

Lukasz Gajowy commented on BEAM-4775:
-

I'm glad that you asked. :) I got drawn away by other things but before that I 
was able to pass MonitoringInfos to PortableRunner successfuly (as described in 
the comment above). So at least that part should be contributed IMO (not 
converting to MetricsResults yet). I should have a PR for this in a week or two.

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 41h 10m
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4775) JobService should support returning metrics

2019-06-25 Thread Kyle Weaver (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872698#comment-16872698
 ] 

Kyle Weaver commented on BEAM-4775:
---

Hey [~ŁukaszG], just checking in – have you been able to make further progress 
on this? Any blockers?

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 41h 10m
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7574) Spark runner: Combine.perKey performance issues

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7574?focusedWorklogId=267018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267018
 ]

ASF GitHub Bot logged work on BEAM-7574:


Author: ASF GitHub Bot
Created on: 25/Jun/19 20:26
Start Date: 25/Jun/19 20:26
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #8946: [BEAM-7574] fix 
Combine performance for SparkRunner
URL: https://github.com/apache/beam/pull/8946
 
 
   Fixes [BEAM-7574]
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/be

[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=267010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267010
 ]

ASF GitHub Bot logged work on BEAM-7424:


Author: ASF GitHub Bot
Created on: 25/Jun/19 20:06
Start Date: 25/Jun/19 20:06
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #8933: [BEAM-7424] Retry HTTP 
429 errors from GCS
URL: https://github.com/apache/beam/pull/8933#issuecomment-505602027
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267010)
Time Spent: 1h 20m  (was: 1h 10m)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=267011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267011
 ]

ASF GitHub Bot logged work on BEAM-7424:


Author: ASF GitHub Bot
Created on: 25/Jun/19 20:06
Start Date: 25/Jun/19 20:06
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #8933: [BEAM-7424] Retry HTTP 
429 errors from GCS
URL: https://github.com/apache/beam/pull/8933#issuecomment-505602027
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267011)
Time Spent: 1.5h  (was: 1h 20m)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=266995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266995
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 25/Jun/19 19:36
Start Date: 25/Jun/19 19:36
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #7061: [BEAM-5191] 
Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/7061#discussion_r297359398
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
 ##
 @@ -372,7 +372,7 @@ public WriteResult 
expandUntriggered(PCollection> inp
 
 tempTables
 .apply("ReifyRenameInput", new ReifyAsIterable<>())
-.setCoder(IterableCoder.of(KvCoder.of(TableDestinationCoderV2.of(), 
StringUtf8Coder.of(
+.setCoder(IterableCoder.of(KvCoder.of(TableDestinationCoderV3.of(), 
StringUtf8Coder.of(
 
 Review comment:
   Same here. The coded value is only passed to WriteRename which creates a 
copy job and never references the clustering configuration.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266995)
Time Spent: 7h 40m  (was: 7.5h)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Wout Scheepers
>Priority: Minor
>  Labels: features, newbie
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=266992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266992
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 25/Jun/19 19:35
Start Date: 25/Jun/19 19:35
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #7061: [BEAM-5191] 
Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/7061#discussion_r297358761
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
 ##
 @@ -292,7 +292,7 @@ private WriteResult 
expandTriggered(PCollection> inpu
 .apply(WithKeys.of((Void) null))
 .setCoder(
 KvCoder.of(
-VoidCoder.of(), KvCoder.of(TableDestinationCoderV2.of(), 
StringUtf8Coder.of(
+VoidCoder.of(), KvCoder.of(TableDestinationCoderV3.of(), 
StringUtf8Coder.of(
 
 Review comment:
   For this particular instance, I think we can safely continue to use 
TableDestinationCoderV2 since none of the subsequent processing steps in this 
method reference the clustering configuration. The TableDestinations we code 
here are used to configure a copy job, which does not need to know about 
clustering.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266992)
Time Spent: 7.5h  (was: 7h 20m)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Wout Scheepers
>Priority: Minor
>  Labels: features, newbie
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=266977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266977
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 25/Jun/19 19:23
Start Date: 25/Jun/19 19:23
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #8945: [BEAM-5191] 
Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/8945
 
 
   This takes the commits from #7061, rebases on master, and makes some final 
changes to ensure that we use the updated `TableDestinationCoderV3` only if 
clustering is set. So essentially, users must opt in to the new coder by 
specifying a clustering parameter.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuil

[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=266932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266932
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:33
Start Date: 25/Jun/19 18:33
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8832: [BEAM-7495] Add 
dynamic worker rebalancing to BigQuery Storage
URL: https://github.com/apache/beam/pull/8832#issuecomment-505567867
 
 
   Thanks Merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266932)
Time Spent: 3h 20m  (was: 3h 10m)
Remaining Estimate: 500h 40m  (was: 500h 50m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 3h 20m
>  Remaining Estimate: 500h 40m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=266934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266934
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:33
Start Date: 25/Jun/19 18:33
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8832: 
[BEAM-7495] Add dynamic worker rebalancing to BigQuery Storage
URL: https://github.com/apache/beam/pull/8832
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266934)
Time Spent: 3.5h  (was: 3h 20m)
Remaining Estimate: 500.5h  (was: 500h 40m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 3.5h
>  Remaining Estimate: 500.5h
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266912
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:17
Start Date: 25/Jun/19 18:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8803: [BEAM-7475] 
update wordcount example
URL: https://github.com/apache/beam/pull/8803
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266912)
Time Spent: 5.5h  (was: 5h 20m)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7631) Remove experimental annotation from stable transforms in Java BigTableIO connector

2019-06-25 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872586#comment-16872586
 ] 

Chamikara Jayalath commented on BEAM-7631:
--

cc: [~sduskis] [~igorbernstein] [~altay] [~dhalp...@google.com] [~iemejia]

> Remove experimental annotation from stable transforms in Java BigTableIO 
> connector
> --
>
> Key: BEAM-7631
> URL: https://issues.apache.org/jira/browse/BEAM-7631
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
>
> This connector has been around for some time and many Beam users use it.
> We can consider removing experimental tag from following transforms.
> BigTableIO.Read
> BigTableIO.Write
> Removing the experimental tag will guarantee our users that the existing API 
> will stay stable.
>  
> Note that this does not prevent adding new transforms to the API or adding 
> new features to existing transforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266908
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:13
Start Date: 25/Jun/19 18:13
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505560414
 
 
   Realized that the changes on PerfkitBenchmarker 
(https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/1934) only pass 
Python tests but not Java. My quick fix 
(https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/1936) is out 
waiting for review.
   
   In order to run Jenkins job based on Perfkit fix, I pointed 
PerfkitBenchmarker git repo to my fix branch in this PR and both Java 
ParquetIOPerformance and Python WordCountIT passed:
   - https://builds.apache.org/job/beam_PerformanceTests_ParquetIOIT/1616/
   - https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/164/
   
   Given that review on Perfkit may take a day or two, Java benchmarks will be 
blocked during this period, I would prefer to point PerfkitBenchmarker repo to 
my fix branch first and I'll switch back to master once it's merged. Would love 
to create a jira tracking this if make sense to you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266908)
Time Spent: 38h 50m  (was: 38h 40m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38h 50m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7218) LTS backport: Upper bound for pytz dependency

2019-06-25 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-7218.
-
Resolution: Fixed

> LTS backport: Upper bound for pytz dependency
> -
>
> Key: BEAM-7218
> URL: https://issues.apache.org/jira/browse/BEAM-7218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Do we need an upper bound for the pytz dependency? 
> ([https://github.com/apache/beam/blob/release-2.5.0/sdks/python/setup.py#L108)]
>  We typically have upper bounds, in order to avoid future breakages due to a 
> possibility of breaking/backward incompatible change of that depepdency.
> Good practice is to upper bound either at known version, or next major 
> version. Do we need an exception for pytz because it does not seem to be 
> following semantic versioning?
> cc: [~yifanzou] Is this something dependency notifier can warn on? Dependency 
> without upper version bounds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7218) LTS backport: Upper bound for pytz dependency

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7218?focusedWorklogId=266903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266903
 ]

ASF GitHub Bot logged work on BEAM-7218:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:08
Start Date: 25/Jun/19 18:08
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8917: [BEAM-7218] 
Backport of #7487
URL: https://github.com/apache/beam/pull/8917
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266903)
Time Spent: 1h 20m  (was: 1h 10m)

> LTS backport: Upper bound for pytz dependency
> -
>
> Key: BEAM-7218
> URL: https://issues.apache.org/jira/browse/BEAM-7218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Do we need an upper bound for the pytz dependency? 
> ([https://github.com/apache/beam/blob/release-2.5.0/sdks/python/setup.py#L108)]
>  We typically have upper bounds, in order to avoid future breakages due to a 
> possibility of breaking/backward incompatible change of that depepdency.
> Good practice is to upper bound either at known version, or next major 
> version. Do we need an exception for pytz because it does not seem to be 
> following semantic versioning?
> cc: [~yifanzou] Is this something dependency notifier can warn on? Dependency 
> without upper version bounds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7608) v1new ReadFromDatastore skips entities

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7608?focusedWorklogId=266901&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266901
 ]

ASF GitHub Bot logged work on BEAM-7608:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:07
Start Date: 25/Jun/19 18:07
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8921: [BEAM-7608] Honor 
DATASTORE_EMULATOR_HOST env var
URL: https://github.com/apache/beam/pull/8921#issuecomment-505558155
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266901)
Time Spent: 50m  (was: 40m)

> v1new ReadFromDatastore skips entities
> --
>
> Key: BEAM-7608
> URL: https://issues.apache.org/jira/browse/BEAM-7608
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: MacOS 10.14.5, Python 2.7
>Reporter: Jacob Gur
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A simple map over a datastore kind in local emulator using the new 
> v1new.datastoreio.ReadFromDatastore skip entities.
> The kind has 1516 entities, and when I map over it using the old 
> ReadFromDatastore transform, it maps all of them, i.e., I can map to id and 
> write to text file.
> But the new transform only maps 365 entities. There is no error. The tail of 
> the standard output is:
> {code:java}
> INFO:root:Latest stats timestamp for kind face_apilog is 2019-06-18 
> 08:15:21+00:00
>  INFO:root:Estimated size bytes for query: 116188
>  INFO:root:Splitting the query into 12 splits
>  INFO:root:Running 
> (((GetEntities/Reshuffle/ReshufflePerKey/GroupByKey/Read)(ref_AppliedPTransform_GetEntities/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_14))((ref_AppliedPTransform_GetEntities/Reshuffle/RemoveRandomKeys_15)(ref_AppliedPTransform_GetEntities/Read_16)))((ref_AppliedPTransform_MapToId_17)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WriteBundles_24)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Pair_25)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WindowInto(WindowIntoFn)_26)(WriteToFile/Write/WriteImpl/GroupByKey/Write)
>  INFO:root:Running 
> (WriteToFile/Write/WriteImpl/GroupByKey/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Extract_31)(ref_PCollection_PCollection_20/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/PreFinalize_32)(ref_PCollection_PCollection_21/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)+(ref_AppliedPTransform_WriteToFile/Write/WriteImpl/FinalizeWrite_33)
>  INFO:root:Starting finalize_write threads with num_shards: 1 (skipped: 0), 
> batches: 1, num_threads: 1
>  INFO:root:Renamed 1 shards in 0.12 seconds.{code}
>  
> The code for the job on the new transform is:
>  
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.datastoreio import ReadFromDatastore
> from apache_beam.io.gcp.datastore.v1new.types import Query
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.to_client_entity().id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  (p
>  | 'GetEntities' >> ReadFromDatastore(Query(kind='face_apilog', 
> project=project))
>  | 'MapToId' >> beam.Map(map_to_id)
>  | 'WriteToFile' >> beam.io.WriteToText('result')
>  )
>  p.run().wait_until_finish()
> if __name__ == '__main__':
>  logging.getLogger().setLevel(logging.INFO)
>  run(sys.argv){code}
>  
> For comparison, the code for the job on the old transform is:
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
> from google.cloud.proto.datastore.v1 import query_pb2
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.key.path[-1].id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  query = query_pb2.Query()
>  query.kind.add().name = 'face_apilog'
>  (p
>  | 'GetEntities' >> ReadFromDatastore(pro

[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266895&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266895
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 18:00
Start Date: 25/Jun/19 18:00
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-50649
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266895)
Time Spent: 38h 40m  (was: 38.5h)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38h 40m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266880
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:48
Start Date: 25/Jun/19 17:48
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505551180
 
 
   Run Python27 WordCountIT Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266880)
Time Spent: 38.5h  (was: 38h 20m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38.5h
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266879
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:48
Start Date: 25/Jun/19 17:48
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505550910
 
 
   Run BigQueryIO Write Performance Test Python Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266879)
Time Spent: 6.5h  (was: 6h 20m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=266878&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266878
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:47
Start Date: 25/Jun/19 17:47
Worklog Time Spent: 10m 
  Work Description: aryann commented on issue #8832: [BEAM-7495] Add 
dynamic worker rebalancing to BigQuery Storage
URL: https://github.com/apache/beam/pull/8832#issuecomment-505550826
 
 
   Run Java Precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266878)
Time Spent: 3h 10m  (was: 3h)
Remaining Estimate: 500h 50m  (was: 501h)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 3h 10m
>  Remaining Estimate: 500h 50m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266869
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:36
Start Date: 25/Jun/19 17:36
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505546454
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266869)
Time Spent: 6h 20m  (was: 6h 10m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266868
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:36
Start Date: 25/Jun/19 17:36
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505544392
 
 
   Run BigQueryIO Write Performance Test Python Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266868)
Time Spent: 6h 10m  (was: 6h)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266863
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:31
Start Date: 25/Jun/19 17:31
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505544392
 
 
   Run BigQueryIO Write Performance Test Python Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266863)
Time Spent: 6h  (was: 5h 50m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266865
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:31
Start Date: 25/Jun/19 17:31
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #8803: 
[BEAM-7475] update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r297306555
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1379,18 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+class CountWordsFn(Dofn):
 
 Review comment:
   Anyway, full python code example link is already provided down below. So we 
don't need to provide the repeated information here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266865)
Time Spent: 5h 20m  (was: 5h 10m)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266857
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:28
Start Date: 25/Jun/19 17:28
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #8803: 
[BEAM-7475] update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r297305232
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1379,18 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+class CountWordsFn(Dofn):
 
 Review comment:
   For consistency shake I will use the single line instead. 
   `word_counts = windowed_words | CountWords()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266857)
Time Spent: 5h 10m  (was: 5h)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266852
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:22
Start Date: 25/Jun/19 17:22
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505540835
 
 
   Run Java ParquetIO Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266852)
Time Spent: 38h 20m  (was: 38h 10m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38h 20m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266849
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:16
Start Date: 25/Jun/19 17:16
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8803: [BEAM-7475] 
update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r297300253
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1379,18 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+class CountWordsFn(Dofn):
 
 Review comment:
   So this is about reusing transforms, right and java version is the single 
line `PCollection> wordCounts = windowedWords.apply(new 
WordCount.CountWords());`
   
   We can probably reduce this too
   
   `word_counts = windowed_words | CountWords()`
   
   And if you would like to add code for what is CountWords, perhaps you can 
link to the pre-existing example transform 
(https://github.com/apache/beam/blob/e65c176a9f34e45d408281e1101a2ae54cef0f6c/sdks/python/apache_beam/examples/wordcount_debugging.py#L91)
   
   What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266849)
Time Spent: 5h  (was: 4h 50m)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266847
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:08
Start Date: 25/Jun/19 17:08
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #8803: 
[BEAM-7475] update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r297297206
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1379,18 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+class CountWordsFn(Dofn):
 
 Review comment:
   @aaltay thanks for looking into this. I definitely not ran this snippet. I 
will fix these issues and make sure it is executable. 
   
   I have another doubt since we have removed the metrics statement so nothing 
in the code is tracking the number of words. is leaving a comment good enough? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266847)
Time Spent: 4h 50m  (was: 4h 40m)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266846
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:07
Start Date: 25/Jun/19 17:07
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505535519
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266846)
Time Spent: 38h 10m  (was: 38h)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38h 10m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3061) BigtableIO should support emitting a sentinel "done" value when a bundle completes

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3061?focusedWorklogId=266842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266842
 ]

ASF GitHub Bot logged work on BEAM-3061:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:01
Start Date: 25/Jun/19 17:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7805: 
[BEAM-3061] Done notifications for BigtableIO.Write
URL: https://github.com/apache/beam/pull/7805#discussion_r297294547
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 ##
 @@ -466,7 +469,7 @@ public String toString() {
   @Experimental(Experimental.Kind.SOURCE_SINK)
   @AutoValue
   public abstract static class Write
-  extends PTransform>>, 
PDone> {
+  extends PTransform>>, 
PCollection> {
 
 Review comment:
   I think we can remove the experimental tag from stable transforms (Read, 
Write). This PR does not break backwards compatibility and I don't think we 
should break backwards compatibility for these transforms in future PRs either. 
This does not prevent adding new features or adding a new "ReadAll" form 
transform when we have SDF.
   
   I agree that this is better done in a separate JIRA/PR. Created 
https://issues.apache.org/jira/browse/BEAM-7631.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266842)
Time Spent: 12h 40m  (was: 12.5h)

> BigtableIO should support emitting a sentinel "done" value when a bundle 
> completes
> --
>
> Key: BEAM-3061
> URL: https://issues.apache.org/jira/browse/BEAM-3061
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> There was some discussion of this on the dev@ mailing list [1].  This 
> approach was taken based on discussion there.
> [1] 
> https://lists.apache.org/thread.html/949b33782f722a9000c9bf9e37042739c6fd0927589b99752b78d7bd@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7631) Remove experimental annotation from stable transforms in Java BigTableIO connector

2019-06-25 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-7631:


Assignee: Chamikara Jayalath

> Remove experimental annotation from stable transforms in Java BigTableIO 
> connector
> --
>
> Key: BEAM-7631
> URL: https://issues.apache.org/jira/browse/BEAM-7631
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
>
> This connector has been around for some time and many Beam users use it.
> We can consider removing experimental tag from following transforms.
> BigTableIO.Read
> BigTableIO.Write
> Removing the experimental tag will guarantee our users that the existing API 
> will stay stable.
>  
> Note that this does not prevent adding new transforms to the API or adding 
> new features to existing transforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=266844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266844
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 25/Jun/19 17:01
Start Date: 25/Jun/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8803: [BEAM-7475] 
update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r297294598
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1379,18 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+class CountWordsFn(Dofn):
 
 Review comment:
   @rakeshcusat have you tested running snipet? I am not sure it will work, 
because:
   - `Dofn` is misspelled. Should be `DoFn`
   - `process` method does not return or yield.
   - `FlatMap` expects a lambda not a DoFn.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266844)
Time Spent: 4h 40m  (was: 4.5h)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7631) Remove experimental annotation from stable transforms in Java BigTableIO connector

2019-06-25 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-7631:


 Summary: Remove experimental annotation from stable transforms in 
Java BigTableIO connector
 Key: BEAM-7631
 URL: https://issues.apache.org/jira/browse/BEAM-7631
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Chamikara Jayalath


This connector has been around for some time and many Beam users use it.

We can consider removing experimental tag from following transforms.

BigTableIO.Read

BigTableIO.Write

Removing the experimental tag will guarantee our users that the existing API 
will stay stable.

 

Note that this does not prevent adding new transforms to the API or adding new 
features to existing transforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7284) Support Py3 Dataclasses

2019-06-25 Thread yoshiki obata (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872526#comment-16872526
 ] 

yoshiki obata commented on BEAM-7284:
-

[~tvalentyn]

I found that there seems to be way to support python3 dataclasses without 
update dill, though have not yet be enough tested;
https://github.com/lazylynx/beam/commit/48101f05bdee19e5f6a842682ed5079767a5c165
Is this worth to use?

(Off course I know replacement of dill is better way :D)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266836
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:40
Start Date: 25/Jun/19 16:40
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505525253
 
 
   Run Python27 WordCountIT Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266836)
Time Spent: 38h  (was: 37h 50m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 38h
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-7437:

Fix Version/s: (was: 2.14.0)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266817
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:31
Start Date: 25/Jun/19 16:31
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505521913
 
 
   Run Python27 WordCountIT Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266817)
Time Spent: 37h 40m  (was: 37.5h)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 37h 40m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=266835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266835
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:32
Start Date: 25/Jun/19 16:32
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #8919: [BEAM-4046, 
BEAM-7527] Fix benchmark with correct Gradle project
URL: https://github.com/apache/beam/pull/8919#issuecomment-505522473
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266835)
Time Spent: 37h 50m  (was: 37h 40m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 37h 50m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872509#comment-16872509
 ] 

Pablo Estrada commented on BEAM-7437:
-

Does this have to be part of 2.14.0? I don't think so. I'm removing the Fix 
Version, but happy to chat about why this should / should not be on 2.1.4.0

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?focusedWorklogId=266814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266814
 ]

ASF GitHub Bot logged work on BEAM-7590:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:20
Start Date: 25/Jun/19 16:20
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #8928: [DO NOT MERGE] 
[BEAM-7590] Converting JDBC Pipeline Options Map to PipelineOptions.
URL: https://github.com/apache/beam/pull/8928#discussion_r297276293
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsReflectionSetter.java
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import java.beans.IntrospectionException;
+import java.beans.Introspector;
+import java.beans.PropertyDescriptor;
+import java.lang.reflect.InvocationTargetException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.beam.sdk.util.StringUtils;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
+
+/** This is a utility class to set and remove options individually. */
+public class PipelineOptionsReflectionSetter {
+  private static final boolean STRICT_PARSING = true;
+
+  @SuppressWarnings("unchecked")
+  public static Class getPipelineOptionsInterface(
+  PipelineOptions options) {
+if (options.getClass().getInterfaces().length != 1) {
 
 Review comment:
   Do you mean a pipeline option interface that extends multiple interfaces or 
the object is implementing multiple interfaces? I couldn't find any example 
that the pipeline option object is implementing two interfaces. At least if the 
user uses PipelineOptionsFactory.as() or PipelineOptionsFactory.create() it 
will return a proxy object that only implements the given interface. (The 
interface though can be subclass of multiple interfaces itself, and that would 
be OK.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266814)
Time Spent: 3.5h  (was: 3h 20m)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently, BeamCalciteTable keeps a map version of PipelineOptions and that 
> map version is used in JDBCConnection and RelNodes as well. This map is empty 
> when the pipeline is constructed from SQLTransform and it will have the 
> parameters passed from JDBC Client when the pipeline is started by JDBC path. 
> Since for Row-Count estimation we need to use PipelineOptions (or its 
> sub-classes) and we cannot convert a map that is created from a 
> pipelineOptions Subclasses back to PipelineOptions, it is better to keep 
> PipelineOptions object itself.
> Another thing that will be changed as a result is set command. Currently, if 
> in JDBC we use Set Command for a pipeline option, it will only change that 
> option in the map. This means even if the option is incorrect, it does not 
> throw exception until it creates the actual Pipeline Options. However, if we 
> are keeping the PipelineOptions class itself, then wee need to a

[jira] [Commented] (BEAM-7195) BigQuery - 404 errors for 'table not found' when using dynamic destinations - sometimes, new table fails to get created

2019-06-25 Thread Juta Staes (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872493#comment-16872493
 ] 

Juta Staes commented on BEAM-7195:
--

This could be related to a flaky Python Post commit test

{code:java}
14:03:18 ERROR: test_multiple_destinations_transform 
(apache_beam.io.gcp.bigquery_test.BigQueryStreamingInsertTransformIntegrationTests)
14:03:18 --
14:03:18 Traceback (most recent call last):
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/bigquery_test.py",
 line 620, in test_multiple_destinations_transform
14:03:18 equal_to([(full_output_table_1, bad_record)]))
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__
14:03:18 self.run().wait_until_finish()
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
 line 406, in run
14:03:18 self._options).run(False)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
 line 419, in run
14:03:18 return self.runner.run_pipeline(self, self._options)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
 line 70, in run_pipeline
14:03:18 hc_assert_that(self.result, pickler.loads(on_success_matcher))
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/hamcrest/core/assert_that.py",
 line 43, in assert_that
14:03:18 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/hamcrest/core/assert_that.py",
 line 49, in _assert_match
14:03:18 if not matcher.matches(actual):
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/hamcrest/core/core/allof.py",
 line 16, in matches
14:03:18 if not matcher.matches(item):
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/hamcrest/core/base_matcher.py",
 line 28, in matches
14:03:18 match_result = self._matches(item)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py",
 line 140, in _matches
14:03:18 response = self._query_with_retry(bigquery_client)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
 line 210, in wrapper
14:03:18 raise_with_traceback(exn, exn_traceback)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/future/utils/__init__.py",
 line 419, in raise_with_traceback
14:03:18 raise exc.with_traceback(traceback)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
 line 197, in wrapper
14:03:18 return fun(*args, **kwargs)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py",
 line 155, in _query_with_retry
14:03:18 return [row.values() for row in query_job]
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/google/cloud/bigquery/job.py",
 line 2718, in __iter__
14:03:18 return iter(self.result())
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/google/cloud/bigquery/job.py",
 line 2685, in result
14:03:18 super(QueryJob, self).result(timeout=timeout)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/google/cloud/bigquery/job.py",
 line 697, in result
14:03:18 return super(_AsyncJob, self).result(timeout=timeout)
14:03:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967054/lib/python3.5/site-packages/google/api_core/future/polling.py",
 line 127, in result
14:03:18 raise self._exception
14:03:18 google.api_core.exceptions.NotFound: 404 Not found: Table 
apache-beam-testing:python_bq_streaming_i

[jira] [Work logged] (BEAM-7455) Improve Avro IO integration test coverage on Python 3.

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7455?focusedWorklogId=266813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266813
 ]

ASF GitHub Bot logged work on BEAM-7455:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:17
Start Date: 25/Jun/19 16:17
Worklog Time Spent: 10m 
  Work Description: fredo838 commented on issue #8818: [BEAM-7455] Improve 
Avro IO integration test coverage on Python 3.
URL: https://github.com/apache/beam/pull/8818#issuecomment-505516654
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266813)
Time Spent: 3h 50m  (was: 3h 40m)

> Improve Avro IO integration test coverage on Python 3.
> --
>
> Key: BEAM-7455
> URL: https://issues.apache.org/jira/browse/BEAM-7455
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-python-avro
>Reporter: Valentyn Tymofieiev
>Assignee: Frederik Bode
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> It seems that we don't have an integration test for Avro IO on Python 3:
> fastavro_it_test [1] depends on both avro and fastavro, however avro package 
> currently does not work with Beam on Python 3, so we don't have an 
> integration test that exercises Avro IO on Python 3. 
> We should add an integration test for Avro IO that does not need both 
> libraries at the same time, and instead can run using either library. 
> [~frederik] is this something you could help with? 
> cc: [~chamikara] [~Juta]
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/fastavro_it_test.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7608) v1new ReadFromDatastore skips entities

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7608?focusedWorklogId=266812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266812
 ]

ASF GitHub Bot logged work on BEAM-7608:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:14
Start Date: 25/Jun/19 16:14
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8921: [BEAM-7608] Honor 
DATASTORE_EMULATOR_HOST env var
URL: https://github.com/apache/beam/pull/8921#issuecomment-505515569
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266812)
Time Spent: 40m  (was: 0.5h)

> v1new ReadFromDatastore skips entities
> --
>
> Key: BEAM-7608
> URL: https://issues.apache.org/jira/browse/BEAM-7608
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: MacOS 10.14.5, Python 2.7
>Reporter: Jacob Gur
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A simple map over a datastore kind in local emulator using the new 
> v1new.datastoreio.ReadFromDatastore skip entities.
> The kind has 1516 entities, and when I map over it using the old 
> ReadFromDatastore transform, it maps all of them, i.e., I can map to id and 
> write to text file.
> But the new transform only maps 365 entities. There is no error. The tail of 
> the standard output is:
> {code:java}
> INFO:root:Latest stats timestamp for kind face_apilog is 2019-06-18 
> 08:15:21+00:00
>  INFO:root:Estimated size bytes for query: 116188
>  INFO:root:Splitting the query into 12 splits
>  INFO:root:Running 
> (((GetEntities/Reshuffle/ReshufflePerKey/GroupByKey/Read)(ref_AppliedPTransform_GetEntities/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_14))((ref_AppliedPTransform_GetEntities/Reshuffle/RemoveRandomKeys_15)(ref_AppliedPTransform_GetEntities/Read_16)))((ref_AppliedPTransform_MapToId_17)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WriteBundles_24)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Pair_25)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WindowInto(WindowIntoFn)_26)(WriteToFile/Write/WriteImpl/GroupByKey/Write)
>  INFO:root:Running 
> (WriteToFile/Write/WriteImpl/GroupByKey/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Extract_31)(ref_PCollection_PCollection_20/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/PreFinalize_32)(ref_PCollection_PCollection_21/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)+(ref_AppliedPTransform_WriteToFile/Write/WriteImpl/FinalizeWrite_33)
>  INFO:root:Starting finalize_write threads with num_shards: 1 (skipped: 0), 
> batches: 1, num_threads: 1
>  INFO:root:Renamed 1 shards in 0.12 seconds.{code}
>  
> The code for the job on the new transform is:
>  
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.datastoreio import ReadFromDatastore
> from apache_beam.io.gcp.datastore.v1new.types import Query
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.to_client_entity().id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  (p
>  | 'GetEntities' >> ReadFromDatastore(Query(kind='face_apilog', 
> project=project))
>  | 'MapToId' >> beam.Map(map_to_id)
>  | 'WriteToFile' >> beam.io.WriteToText('result')
>  )
>  p.run().wait_until_finish()
> if __name__ == '__main__':
>  logging.getLogger().setLevel(logging.INFO)
>  run(sys.argv){code}
>  
> For comparison, the code for the job on the old transform is:
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
> from google.cloud.proto.datastore.v1 import query_pb2
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.key.path[-1].id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  query = query_pb2.Query()
>  query.kind.add().name = 'face_apilog'
>  (p
>  | 'GetEntities' >> ReadFromData

[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=266810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266810
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:11
Start Date: 25/Jun/19 16:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#issuecomment-505514524
 
 
   r: @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266810)
Time Spent: 2h 50m  (was: 2h 40m)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=266806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266806
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:05
Start Date: 25/Jun/19 16:05
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8942: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8942#issuecomment-505511802
 
 
   R: @iemejia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266806)
Time Spent: 50m  (was: 40m)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7218) LTS backport: Upper bound for pytz dependency

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7218?focusedWorklogId=266804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266804
 ]

ASF GitHub Bot logged work on BEAM-7218:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:00
Start Date: 25/Jun/19 16:00
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8917: [BEAM-7218] Backport of 
#7487
URL: https://github.com/apache/beam/pull/8917#issuecomment-505509842
 
 
   ping, gradle scan of test run is in PR desc
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266804)
Time Spent: 1h 10m  (was: 1h)

> LTS backport: Upper bound for pytz dependency
> -
>
> Key: BEAM-7218
> URL: https://issues.apache.org/jira/browse/BEAM-7218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Do we need an upper bound for the pytz dependency? 
> ([https://github.com/apache/beam/blob/release-2.5.0/sdks/python/setup.py#L108)]
>  We typically have upper bounds, in order to avoid future breakages due to a 
> possibility of breaking/backward incompatible change of that depepdency.
> Good practice is to upper bound either at known version, or next major 
> version. Do we need an exception for pytz because it does not seem to be 
> following semantic versioning?
> cc: [~yifanzou] Is this something dependency notifier can warn on? Dependency 
> without upper version bounds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=266805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266805
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 25/Jun/19 16:00
Start Date: 25/Jun/19 16:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8942: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8942
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/

[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=266796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266796
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 25/Jun/19 15:53
Start Date: 25/Jun/19 15:53
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#issuecomment-505507273
 
 
   R: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266796)
Time Spent: 2h 40m  (was: 2.5h)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?focusedWorklogId=266789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266789
 ]

ASF GitHub Bot logged work on BEAM-7590:


Author: ASF GitHub Bot
Created on: 25/Jun/19 15:39
Start Date: 25/Jun/19 15:39
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8928: [DO NOT 
MERGE] [BEAM-7590] Converting JDBC Pipeline Options Map to PipelineOptions.
URL: https://github.com/apache/beam/pull/8928#discussion_r297255666
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsReflectionSetter.java
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import java.beans.IntrospectionException;
+import java.beans.Introspector;
+import java.beans.PropertyDescriptor;
+import java.lang.reflect.InvocationTargetException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.beam.sdk.util.StringUtils;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
+
+/** This is a utility class to set and remove options individually. */
+public class PipelineOptionsReflectionSetter {
+  private static final boolean STRICT_PARSING = true;
+
+  @SuppressWarnings("unchecked")
+  public static Class getPipelineOptionsInterface(
+  PipelineOptions options) {
+if (options.getClass().getInterfaces().length != 1) {
 
 Review comment:
   PipelineOptions interface validation is fairly strict on making sure that 
methods have the same default/parameter type/...
   You should really support multiple interfaces since its quite common in our 
codebase.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266789)
Time Spent: 3h 20m  (was: 3h 10m)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently, BeamCalciteTable keeps a map version of PipelineOptions and that 
> map version is used in JDBCConnection and RelNodes as well. This map is empty 
> when the pipeline is constructed from SQLTransform and it will have the 
> parameters passed from JDBC Client when the pipeline is started by JDBC path. 
> Since for Row-Count estimation we need to use PipelineOptions (or its 
> sub-classes) and we cannot convert a map that is created from a 
> pipelineOptions Subclasses back to PipelineOptions, it is better to keep 
> PipelineOptions object itself.
> Another thing that will be changed as a result is set command. Currently, if 
> in JDBC we use Set Command for a pipeline option, it will only change that 
> option in the map. This means even if the option is incorrect, it does not 
> throw exception until it creates the actual Pipeline Options. However, if we 
> are keeping the PipelineOptions class itself, then wee need to actually set 
> the passed parameters (using reflection) which will throw exception at the 
> time of setting them. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7605) Provide a way for user code to read dataflow runner stats

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7605?focusedWorklogId=266779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266779
 ]

ASF GitHub Bot logged work on BEAM-7605:


Author: ASF GitHub Bot
Created on: 25/Jun/19 15:16
Start Date: 25/Jun/19 15:16
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8913: [BEAM-7605] Allow 
user-code to read counters from the dataflow worker
URL: https://github.com/apache/beam/pull/8913#issuecomment-505491076
 
 
   On Tue, Jun 25, 2019 at 7:25 AM Steven Niemitz 
   wrote:
   
   > Thanks for the reply!
   >
   > you're right, for Spark MetricsPusher thread will be instantiated on the
   > Driver machine and Flink MetricsPusher thread will be instantiated on the
   > JobManager machine. Indeed it pushes aggregated user metrics every each x
   > seconds to a configured sink
   >
   > Looking at it, it doesn't seem like it'd be a large amount of work to have
   > it run inside of the worker process, rather than on the submitter.
   >
   > That being said, if what you want are non-aggregated system metrics,
   > MetricsPusher does not do that currently, it would need to be enhanced.
   >
   > Agreed. It doesn't seem like a large amount of work though. Even if it is
   > aggregating, it'll only be aggregating a single worker anyways, so its
   > essentially non-aggregated.
   >
   > But the problem is that Dataflow Beam runner is a client nutshell that
   > just delegates the run of a serialized pipeline to the remote cloud hosted
   > Dataflow engine (see BEAM-3926), so it would require to code on the
   > Dataflow engine side a MetricsPusher-like service.
   >
   > The Dataflow worker code was opensourced and lives here:
   
https://github.com/apache/beam/tree/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker
   
   > That is what this PR is actually adding (kind of). CounterUpdateReceiver
   > is called into from the dataflow worker with the counters it has collected
   > over that period. This contains all system and user metrics that were
   > collected, albeit in a different format. There are (probably?) metrics that
   > the windmill service collects (and pushes to the dataflow metrics
   > collection service) that this doesn't capture, but its good enough for now,
   > and anything is better than nothing here imo.
   >
   > Here's what I could see going forward:
   >
   >- "somehow" adapt the dataflow metrics container to a
   >MetricsContainerStepMap. It seems like implementing a
   >MetricsContainerStepMap that wraps the dataflow metrics (
   >pendingDeltaCounters and pendingCumulativeCounters) would be the way
   >to go here.
   >- figure out how to indicate "where" to push metrics from. It looks
   >like dataflow doesn't support MetricsPusher at all, so there's no 
backwards
   >compatibility problem where, but it might be confusing that some runners
   >push aggregated metrics from the submitter, and some non-aggregated from
   >the worker. Maybe another flag that indicates aggregated vs 
non-aggregated?
   >
   > Interested to hear your thoughts!
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266779)
Time Spent: 1h 40m  (was: 1.5h)

> Provide a way for user code to read dataflow runner stats
> -
>
> Key: BEAM-7605
> URL: https://issues.apache.org/jira/browse/BEAM-7605
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The dataflow runner collects (and publishes to the dataflow service) a large 
> number of useful stats.  While these can be polled from the dataflow service 
> via its API, there are a few downsides to this:
>  * it requires another process to poll and collect the stats
>  * the stats are aggr

[jira] [Work logged] (BEAM-7629) Improve DoFn method validation in core/graph/fn.go

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7629?focusedWorklogId=266749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266749
 ]

ASF GitHub Bot logged work on BEAM-7629:


Author: ASF GitHub Bot
Created on: 25/Jun/19 15:00
Start Date: 25/Jun/19 15:00
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #8936: [BEAM-7629] Go SDK 
additional Validation for DoFns (1st impl) (DO NOT MERGE)
URL: https://github.com/apache/beam/pull/8936#issuecomment-505484499
 
 
   > R: @lostluck
   > 
   > Hoping I could get your feedback on what I have so far and help answering 
some questions.
   > 
   > 1. Some of my tests currently fail because the following is expected in 
signatures for all DoFn methods: `side input parameters must follow main input 
parameter`. My understanding however is that when including side inputs in the 
StartBundle and FinishBundle methods (if ProcessElement has some) only the side 
inputs should be included, with no main inputs. So theoretically it should 
work, but right now all methods are treated as if they're ProcessElement. Is 
this correct?
   They shouldn't be treated exactly like ProcessElement, since the main inputs 
can never be populated in StartBundle and FinishBundle. The two should be 
treated the same as each other though.
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L134
 is probably the root call where the error is being triggered.
   
   The state machine is probably being a bit overzealous, and needs to be 
relaxed. It should be validating the order only not requiring the presence of 
anything. This PR is adding the necessary presence checking in a more aware 
place. This is my fault, sorry for the inconvenience.
 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/funcx/fn.go#L311
   
   > 2. I had trouble finding info for the expected inputs and outputs of the 
DoFn methods, and had to resort to checking other SDKs. Is there a place where 
that's documented for the Go SDK? If not, could you clarify for StartBundle, 
FinishBundle, Setup, and Teardown what parameters and return values they cab 
have (in addition to the side inputs and emits that I already account for)? 
Right now I make an assumption that Setup/Teardown shouldn't have any 
parameters based on the Java SDK, but couldn't find any concrete info for 
return values for any of the methods.
   
   Re: Setup & Teardown: That's right they should only accept context.Context, 
and that's it.
   Re: StartBundle & FinishBundle, should only be returning error I think? They 
almost certainly should *not* have main returns other than error.
   
   > 3. Should I worry about the Go SDK "generics" when it comes to the side 
input and emit validation? Is it even valid for a user to use generics to 
replace emits or side inputs there, and if so should I be checking that they're 
consistent? For this first draft I just ignored the possibility of generics.
   
   That's totally reasonable for this pass. I'm going to assume you mean the 
Universal Types ( beam.T, beam.X, beam.Y ... ). Universal types may only be 
used to replace a single type at present, not composite types (like, KVs, 
CoGBKs, or Iterators in any of their flavours.) When validating, all that 
matters is that they're used consistently (eg. a Beam.T emit maps to a Beam.T 
emit). They may not be substituted within a given DoFn between methods (eg. 
ProcessElement has a func(beam.T), then so too must Start and Finish).
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266749)
Time Spent: 0.5h  (was: 20m)

> Improve DoFn method validation in core/graph/fn.go
> --
>
> Key: BEAM-7629
> URL: https://issues.apache.org/jira/browse/BEAM-7629
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Various improvements can be made to validating the signatures and type usages 
> in DoFns. Some things that should probably be checked:
>  * Check that StartBundle and FinishBundle contain any emit parameters and 
> side inputs present in ProcessElement
>  * Check that any side inputs/emits have correctly matching types between 
> Start/FinishBundle and ProcessElement
>  * Check that parameters and return values for the various methods are valid 
> (for ex. T

[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=266750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266750
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 25/Jun/19 15:00
Start Date: 25/Jun/19 15:00
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6511: [BEAM-5519] Remove 
call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-505484677
 
 
   This is slightly staled but because of review discussion and should not be 
closed until we have some consensus.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266750)
Time Spent: 5.5h  (was: 5h 20m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.15.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266748
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:58
Start Date: 25/Jun/19 14:58
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505483698
 
 
   Run BigQueryIO Read Performance Test Python
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266748)
Time Spent: 5h 50m  (was: 5h 40m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266747
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:58
Start Date: 25/Jun/19 14:58
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505483368
 
 
   Run BigQueryIO Write Performance Test Python Batch
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266747)
Time Spent: 5h 40m  (was: 5.5h)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=266745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266745
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:57
Start Date: 25/Jun/19 14:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8874: [DO NOT 
MERGE][BEAM-7535] Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#issuecomment-505483368
 
 
   Run BigQueryIO Write Performance Test Python Batch
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266745)
Time Spent: 5.5h  (was: 5h 20m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7528) Save correctly Python Load Tests metrics according to it's namespace

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7528?focusedWorklogId=266744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266744
 ]

ASF GitHub Bot logged work on BEAM-7528:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:56
Start Date: 25/Jun/19 14:56
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #8941: [BEAM-7528] Save 
load test metrics according to distribution name
URL: https://github.com/apache/beam/pull/8941#issuecomment-505482767
 
 
   Run Python Load Tests Smoke
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266744)
Time Spent: 0.5h  (was: 20m)

> Save correctly Python Load Tests metrics according to it's namespace
> 
>
> Key: BEAM-7528
> URL: https://issues.apache.org/jira/browse/BEAM-7528
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Bug discovered when metrics monitored more than one distribution and saved 
> all as `runtime`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=266742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266742
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:51
Start Date: 25/Jun/19 14:51
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#issuecomment-505480418
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266742)
Time Spent: 2.5h  (was: 2h 20m)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7605) Provide a way for user code to read dataflow runner stats

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7605?focusedWorklogId=266737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266737
 ]

ASF GitHub Bot logged work on BEAM-7605:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:37
Start Date: 25/Jun/19 14:37
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #8913: [BEAM-7605] 
Allow user-code to read counters from the dataflow worker
URL: https://github.com/apache/beam/pull/8913#issuecomment-505469178
 
 
   Thanks for the reply!  
   
   > you're right, for Spark MetricsPusher thread will be instantiated on the 
Driver machine and Flink MetricsPusher thread will be instantiated on the 
JobManager machine. Indeed it pushes aggregated user metrics every each x 
seconds to a configured sink
   
   Looking at it, it doesn't seem like it'd be a large amount of work to have 
it run inside of the worker process, rather than on the submitter.  
   
   > That being said, if what you want are non-aggregated system metrics, 
MetricsPusher does not do that currently, it would need to be enhanced.
   
   Agreed.  It doesn't seem like a large amount of work though.  Even if it is 
aggregating, it'll only be aggregating a single worker anyways, so its 
essentially non-aggregated. 
   
   > But the problem is that Dataflow Beam runner is a client nutshell that 
just delegates the run of a serialized pipeline to the remote cloud hosted 
Dataflow engine (see BEAM-3926), so it would require to code on the Dataflow 
engine side a MetricsPusher-like service.
   
   That is what this PR is actually adding (kind of).  `CounterUpdateReceiver` 
is called into from the dataflow worker with the counters it has collected over 
that period.  This contains all system and user metrics that were collected, 
albeit in a different format.  There are (probably?) metrics that the windmill 
service collects (and pushes to the dataflow metrics collection service) that 
this doesn't capture, but its good enough for now, and anything is better than 
nothing here imo.
   
   Here's what I could see going forward:
   - "somehow" adapt the dataflow metrics container to a 
`MetricsContainerStepMap`.  It seems like implementing a 
`MetricsContainerStepMap` that wraps the dataflow metrics 
(`pendingDeltaCounters` and `pendingCumulativeCounters`) would be the way to go 
here.
   - figure out how to indicate "where" to push metrics from.  It looks like 
the dataflow runner doesn't support MetricsPusher at all, so there's no 
backwards compatibility problem there, but it might be confusing that some 
runners push aggregated metrics from the submitter, and some non-aggregated 
from the worker.  Maybe another flag that indicates aggregated vs 
non-aggregated?
   
   Interested to hear your thoughts!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266737)
Time Spent: 1.5h  (was: 1h 20m)

> Provide a way for user code to read dataflow runner stats
> -
>
> Key: BEAM-7605
> URL: https://issues.apache.org/jira/browse/BEAM-7605
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The dataflow runner collects (and publishes to the dataflow service) a large 
> number of useful stats.  While these can be polled from the dataflow service 
> via its API, there are a few downsides to this:
>  * it requires another process to poll and collect the stats
>  * the stats are aggregated across all workers, so per-worker stats are lost
> It would be simple to provide a hook to allow users to receive stats updates 
> as well, and then do whatever they want with them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7626) ExecutableStage should be able to accept multiple input PCollection

2019-06-25 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872399#comment-16872399
 ] 

Ismaël Mejía commented on BEAM-7626:


Aren't other inputs specified as side inputs? 
https://github.com/apache/beam/blob/5ee4cf4e4880782492ec26f2b454a6df9b25f1e2/model/pipeline/src/main/proto/beam_runner_api.proto#L1251

> ExecutableStage should be able to accept multiple input PCollection 
> 
>
> Key: BEAM-7626
> URL: https://issues.apache.org/jira/browse/BEAM-7626
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Boyuan Zhang
>Priority: Major
>
> Current implementation of ExecutableStage only accepts one input PColletion: 
> https://github.com/apache/beam/blob/5ee4cf4e4880782492ec26f2b454a6df9b25f1e2/model/pipeline/src/main/proto/beam_runner_api.proto#L1247.
>  But it's possible that a ExecutableStage has multiple inputs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3061) BigtableIO should support emitting a sentinel "done" value when a bundle completes

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3061?focusedWorklogId=266710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266710
 ]

ASF GitHub Bot logged work on BEAM-3061:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:28
Start Date: 25/Jun/19 14:28
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #7805: 
[BEAM-3061] Done notifications for BigtableIO.Write
URL: https://github.com/apache/beam/pull/7805#discussion_r297218273
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 ##
 @@ -466,7 +469,7 @@ public String toString() {
   @Experimental(Experimental.Kind.SOURCE_SINK)
   @AutoValue
   public abstract static class Write
-  extends PTransform>>, 
PDone> {
+  extends PTransform>>, 
PCollection> {
 
 Review comment:
   thats fine with me...I just want to land this at this point :P
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266710)
Time Spent: 12.5h  (was: 12h 20m)

> BigtableIO should support emitting a sentinel "done" value when a bundle 
> completes
> --
>
> Key: BEAM-3061
> URL: https://issues.apache.org/jira/browse/BEAM-3061
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> There was some discussion of this on the dev@ mailing list [1].  This 
> approach was taken based on discussion there.
> [1] 
> https://lists.apache.org/thread.html/949b33782f722a9000c9bf9e37042739c6fd0927589b99752b78d7bd@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7528) Save correctly Python Load Tests metrics according to it's namespace

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7528?focusedWorklogId=266708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266708
 ]

ASF GitHub Bot logged work on BEAM-7528:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:27
Start Date: 25/Jun/19 14:27
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #8941: [BEAM-7528] Save 
load test metrics according to distribution name
URL: https://github.com/apache/beam/pull/8941#issuecomment-505470089
 
 
   Run Python Load Tests Smoke
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266708)
Time Spent: 20m  (was: 10m)

> Save correctly Python Load Tests metrics according to it's namespace
> 
>
> Key: BEAM-7528
> URL: https://issues.apache.org/jira/browse/BEAM-7528
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Bug discovered when metrics monitored more than one distribution and saved 
> all as `runtime`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7463?focusedWorklogId=266709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266709
 ]

ASF GitHub Bot logged work on BEAM-7463:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:27
Start Date: 25/Jun/19 14:27
Worklog Time Spent: 10m 
  Work Description: Juta commented on issue #8940: [DO NOT MERGE] 
[BEAM-7463] fix flaky bigquery io ITs
URL: https://github.com/apache/beam/pull/8940#issuecomment-505470131
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266709)
Time Spent: 3h 20m  (was: 3h 10m)

> Bigquery IO ITs are flaky: incorrect checksum
> -
>
> Key: BEAM-7463
> URL: https://issues.apache.org/jira/browse/BEAM-7463
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types 
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38 
> --
> 15:03:38 Traceback (most recent call last):
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
>  line 211, in test_big_query_new_types
> 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
>  line 82, in run_bq_pipeline
> 15:03:38 result = p.run()
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 15:03:38 else test_runner_api))
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 15:03:38 self._options).run(False)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 15:03:38 return self.runner.run_pipeline(self, self._options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError: 
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and 
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38  but: Expected checksum is 
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is 
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher? 
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>  
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well: 
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7528) Save correctly Python Load Tests metrics according to it's namespace

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7528?focusedWorklogId=266707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266707
 ]

ASF GitHub Bot logged work on BEAM-7528:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:27
Start Date: 25/Jun/19 14:27
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on pull request #8941: [BEAM-7528] 
Save load test metrics according to distribution name
URL: https://github.com/apache/beam/pull/8941
 
 
   In case there are collected different metrics than the one triggered by load 
tests, the runtime metric was not correct calculated. All other metrics are 
saved as well.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Da

[jira] [Work logged] (BEAM-7605) Provide a way for user code to read dataflow runner stats

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7605?focusedWorklogId=266704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266704
 ]

ASF GitHub Bot logged work on BEAM-7605:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:25
Start Date: 25/Jun/19 14:25
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #8913: [BEAM-7605] 
Allow user-code to read counters from the dataflow worker
URL: https://github.com/apache/beam/pull/8913#issuecomment-505469178
 
 
   Thanks for the reply!  
   
   > you're right, for Spark MetricsPusher thread will be instantiated on the 
Driver machine and Flink MetricsPusher thread will be instantiated on the 
JobManager machine. Indeed it pushes aggregated user metrics every each x 
seconds to a configured sink
   
   Looking at it, it doesn't seem like it'd be a large amount of work to have 
it run inside of the worker process, rather than on the submitter.  
   
   > That being said, if what you want are non-aggregated system metrics, 
MetricsPusher does not do that currently, it would need to be enhanced.
   
   Agreed.  It doesn't seem like a large amount of work though.  Even if it is 
aggregating, it'll only be aggregating a single worker anyways, so its 
essentially non-aggregated. 
   
   > But the problem is that Dataflow Beam runner is a client nutshell that 
just delegates the run of a serialized pipeline to the remote cloud hosted 
Dataflow engine (see BEAM-3926), so it would require to code on the Dataflow 
engine side a MetricsPusher-like service.
   
   That is what this PR is actually adding (kind of).  `CounterUpdateReceiver` 
is called into from the dataflow worker with the counters it has collected over 
that period.  This contains all system and user metrics that were collected, 
albeit in a different format.  There are (probably?) metrics that the windmill 
service collects (and pushes to the dataflow metrics collection service) that 
this doesn't capture, but its good enough for now, and anything is better than 
nothing here imo.
   
   Here's what I could see going forward:
   - "somehow" adapt the dataflow metrics container to a 
`MetricsContainerStepMap`.  It seems like implementing a 
`MetricsContainerStepMap` that wraps the dataflow metrics 
(`pendingDeltaCounters` and `pendingCumulativeCounters`) would be the way to go 
here.
   - figure out how to indicate "where" to push metrics from.  It looks like 
dataflow doesn't support MetricsPusher at all, so there's no backwards 
compatibility problem where, but it might be confusing that some runners push 
aggregated metrics from the submitter, and some non-aggregated from the worker. 
 Maybe another flag that indicates aggregated vs non-aggregated?
   
   Interested to hear your thoughts!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266704)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide a way for user code to read dataflow runner stats
> -
>
> Key: BEAM-7605
> URL: https://issues.apache.org/jira/browse/BEAM-7605
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The dataflow runner collects (and publishes to the dataflow service) a large 
> number of useful stats.  While these can be polled from the dataflow service 
> via its API, there are a few downsides to this:
>  * it requires another process to poll and collect the stats
>  * the stats are aggregated across all workers, so per-worker stats are lost
> It would be simple to provide a hook to allow users to receive stats updates 
> as well, and then do whatever they want with them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3061) BigtableIO should support emitting a sentinel "done" value when a bundle completes

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3061?focusedWorklogId=266702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266702
 ]

ASF GitHub Bot logged work on BEAM-3061:


Author: ASF GitHub Bot
Created on: 25/Jun/19 14:24
Start Date: 25/Jun/19 14:24
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #7805: [BEAM-3061] 
Done notifications for BigtableIO.Write
URL: https://github.com/apache/beam/pull/7805#discussion_r297215186
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 ##
 @@ -466,7 +469,7 @@ public String toString() {
   @Experimental(Experimental.Kind.SOURCE_SINK)
   @AutoValue
   public abstract static class Write
-  extends PTransform>>, 
PDone> {
+  extends PTransform>>, 
PCollection> {
 
 Review comment:
   Also with the ongoing work on 
[BEAM-7615](https://issues.apache.org/jira/browse/BEAM-7615) I am not even sure 
the API will stay stable for the Read part too, so definitely -1 on any API 
stability annotation change at least for this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266702)
Time Spent: 12h 20m  (was: 12h 10m)

> BigtableIO should support emitting a sentinel "done" value when a bundle 
> completes
> --
>
> Key: BEAM-3061
> URL: https://issues.apache.org/jira/browse/BEAM-3061
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> There was some discussion of this on the dev@ mailing list [1].  This 
> approach was taken based on discussion there.
> [1] 
> https://lists.apache.org/thread.html/949b33782f722a9000c9bf9e37042739c6fd0927589b99752b78d7bd@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >