[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery

ASF GitHub Bot (JIRA) Tue, 23 Oct 2018 10:31:19 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=157726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-157726
 ]


ASF GitHub Bot logged work on BEAM-5537:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Oct/18 17:30
            Start Date: 23/Oct/18 17:30
    Worklog Time Spent: 10m 
      Work Description: chamikaramj closed pull request #6699: [BEAM-5537] 
Upgrade BigQuery client from 0.25.0 to 1.6.0
URL: https://github.com/apache/beam/pull/6699
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py 
b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
index 2fc19daa1af..67b80c017bc 100644
--- a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
@@ -83,11 +83,8 @@ def setUp(self):
         self.input_topic.name)
 
     # Set up BigQuery environment
-    from google.cloud import bigquery
-    client = bigquery.Client()
-    unique_dataset_name = self.OUTPUT_DATASET + str(int(time.time()))
-    self.dataset = client.dataset(unique_dataset_name, project=self.project)
-    self.dataset.create()
+    self.dataset_ref = utils.create_bq_dataset(self.project,
+                                               self.OUTPUT_DATASET)
 
     self._test_timestamp = int(time.time() * 1000)
 
@@ -105,17 +102,14 @@ def _cleanup_pubsub(self):
     test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub])
     test_utils.cleanup_topics(self.pub_client, [self.input_topic])
 
-  def _cleanup_dataset(self):
-    self.dataset.delete()
-
   @attr('IT')
   def test_game_stats_it(self):
     state_verifier = PipelineStateMatcher(PipelineState.RUNNING)
 
     success_condition = 'mean_duration=300 LIMIT 1'
-    sessions_query = ('SELECT mean_duration FROM [%s:%s.%s] '
+    sessions_query = ('SELECT mean_duration FROM `%s.%s.%s` '
                       'WHERE %s' % (self.project,
-                                    self.dataset.name,
+                                    self.dataset_ref.dataset_id,
                                     self.OUTPUT_TABLE_SESSIONS,
                                     success_condition))
     bq_sessions_verifier = BigqueryMatcher(self.project,
@@ -125,7 +119,7 @@ def test_game_stats_it(self):
     # TODO(mariagh): Add teams table verifier once game_stats.py is fixed.
 
     extra_opts = {'subscription': self.input_sub.name,
-                  'dataset': self.dataset.name,
+                  'dataset': self.dataset_ref.dataset_id,
                   'topic': self.input_topic.name,
                   'fixed_window_duration': 1,
                   'user_activity_window_duration': 1,
@@ -137,11 +131,7 @@ def test_game_stats_it(self):
     # Register cleanup before pipeline execution.
     # Note that actual execution happens in reverse order.
     self.addCleanup(self._cleanup_pubsub)
-    self.addCleanup(self._cleanup_dataset)
-    self.addCleanup(utils.delete_bq_table, self.project,
-                    self.dataset.name, self.OUTPUT_TABLE_SESSIONS)
-    self.addCleanup(utils.delete_bq_table, self.project,
-                    self.dataset.name, self.OUTPUT_TABLE_TEAMS)
+    self.addCleanup(utils.delete_bq_dataset, self.project, self.dataset_ref)
 
     # Generate input data and inject to PubSub.
     self._inject_pubsub_game_events(self.input_topic, self.DEFAULT_INPUT_COUNT)
diff --git 
a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py 
b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
index 584aa055d3d..56851320121 100644
--- 
a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
+++ 
b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
@@ -33,7 +33,6 @@
 from __future__ import absolute_import
 
 import logging
-import time
 import unittest
 
 from hamcrest.core.core.allof import all_of
@@ -60,20 +59,14 @@ def setUp(self):
     self.project = self.test_pipeline.get_option('project')
 
     # Set up BigQuery environment
-    from google.cloud import bigquery
-    client = bigquery.Client()
-    unique_dataset_name = self.OUTPUT_DATASET + str(int(time.time()))
-    self.dataset = client.dataset(unique_dataset_name, project=self.project)
-    self.dataset.create()
-
-  def _cleanup_dataset(self):
-    self.dataset.delete()
+    self.dataset_ref = utils.create_bq_dataset(self.project,
+                                               self.OUTPUT_DATASET)
 
   @attr('IT')
   def test_hourly_team_score_it(self):
     state_verifier = PipelineStateMatcher(PipelineState.DONE)
-    query = ('SELECT COUNT(*) FROM [%s:%s.%s]' % (self.project,
-                                                  self.dataset.name,
+    query = ('SELECT COUNT(*) FROM `%s.%s.%s`' % (self.project,
+                                                  self.dataset_ref.dataset_id,
                                                   self.OUTPUT_TABLE))
 
     bigquery_verifier = BigqueryMatcher(self.project,
@@ -81,16 +74,14 @@ def test_hourly_team_score_it(self):
                                         self.DEFAULT_EXPECTED_CHECKSUM)
 
     extra_opts = {'input': self.DEFAULT_INPUT_FILE,
-                  'dataset': self.dataset.name,
+                  'dataset': self.dataset_ref.dataset_id,
                   'window_duration': 1,
                   'on_success_matcher': all_of(state_verifier,
                                                bigquery_verifier)}
 
     # Register clean up before pipeline execution
     # Note that actual execution happens in reverse order.
-    self.addCleanup(self._cleanup_dataset)
-    self.addCleanup(utils.delete_bq_table, self.project,
-                    self.dataset.name, self.OUTPUT_TABLE)
+    self.addCleanup(utils.delete_bq_dataset, self.project, self.dataset_ref)
 
     # Get pipeline options from command argument: --test-pipeline-options,
     # and start pipeline job by calling pipeline main function.
diff --git 
a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py 
b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
index e0e309b1265..27a170fcb67 100644
--- a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
@@ -85,11 +85,8 @@ def setUp(self):
         self.input_topic.name)
 
     # Set up BigQuery environment
-    from google.cloud import bigquery
-    client = bigquery.Client()
-    unique_dataset_name = self.OUTPUT_DATASET + str(int(time.time()))
-    self.dataset = client.dataset(unique_dataset_name, project=self.project)
-    self.dataset.create()
+    self.dataset_ref = utils.create_bq_dataset(self.project,
+                                               self.OUTPUT_DATASET)
 
     self._test_timestamp = int(time.time() * 1000)
 
@@ -107,26 +104,23 @@ def _cleanup_pubsub(self):
     test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub])
     test_utils.cleanup_topics(self.pub_client, [self.input_topic])
 
-  def _cleanup_dataset(self):
-    self.dataset.delete()
-
   @attr('IT')
   def test_leader_board_it(self):
     state_verifier = PipelineStateMatcher(PipelineState.RUNNING)
 
     success_condition = 'total_score=5000 LIMIT 1'
-    users_query = ('SELECT total_score FROM [%s:%s.%s] '
+    users_query = ('SELECT total_score FROM `%s.%s.%s` '
                    'WHERE %s' % (self.project,
-                                 self.dataset.name,
+                                 self.dataset_ref.dataset_id,
                                  self.OUTPUT_TABLE_USERS,
                                  success_condition))
     bq_users_verifier = BigqueryMatcher(self.project,
                                         users_query,
                                         self.DEFAULT_EXPECTED_CHECKSUM)
 
-    teams_query = ('SELECT total_score FROM [%s:%s.%s] '
+    teams_query = ('SELECT total_score FROM `%s.%s.%s` '
                    'WHERE %s' % (self.project,
-                                 self.dataset.name,
+                                 self.dataset_ref.dataset_id,
                                  self.OUTPUT_TABLE_TEAMS,
                                  success_condition))
     bq_teams_verifier = BigqueryMatcher(self.project,
@@ -134,7 +128,7 @@ def test_leader_board_it(self):
                                         self.DEFAULT_EXPECTED_CHECKSUM)
 
     extra_opts = {'subscription': self.input_sub.name,
-                  'dataset': self.dataset.name,
+                  'dataset': self.dataset_ref.dataset_id,
                   'topic': self.input_topic.name,
                   'team_window_duration': 1,
                   'wait_until_finish_duration':
@@ -146,11 +140,7 @@ def test_leader_board_it(self):
     # Register cleanup before pipeline execution.
     # Note that actual execution happens in reverse order.
     self.addCleanup(self._cleanup_pubsub)
-    self.addCleanup(self._cleanup_dataset)
-    self.addCleanup(utils.delete_bq_table, self.project,
-                    self.dataset.name, self.OUTPUT_TABLE_USERS)
-    self.addCleanup(utils.delete_bq_table, self.project,
-                    self.dataset.name, self.OUTPUT_TABLE_TEAMS)
+    self.addCleanup(utils.delete_bq_dataset, self.project, self.dataset_ref)
 
     # Generate input data and inject to PubSub.
     self._inject_pubsub_game_events(self.input_topic, self.DEFAULT_INPUT_COUNT)
diff --git 
a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py 
b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py
index b7e90839ffa..21e9c48f302 100644
--- a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py
+++ b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py
@@ -53,7 +53,7 @@ def test_bigquery_tornadoes_it(self):
     dataset = 'BigQueryTornadoesIT'
     table = 'monthly_tornadoes_%s' % int(round(time.time() * 1000))
     output_table = '.'.join([dataset, table])
-    query = 'SELECT month, tornado_count FROM [%s]' % output_table
+    query = 'SELECT month, tornado_count FROM `%s`' % output_table
 
     pipeline_verifiers = [PipelineStateMatcher(),
                           BigqueryMatcher(
@@ -64,6 +64,7 @@ def test_bigquery_tornadoes_it(self):
                   'on_success_matcher': all_of(*pipeline_verifiers)}
 
     # Register cleanup before pipeline execution.
+    # Note that actual execution happens in reverse order.
     self.addCleanup(utils.delete_bq_table, project, dataset, table)
 
     # Get pipeline options from command argument: --test-pipeline-options,
diff --git a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py 
b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py
index c0962c73e65..adf3a0fcac0 100644
--- a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py
+++ b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py
@@ -47,7 +47,7 @@
 NEW_TYPES_OUTPUT_SCHEMA = (
     '{"fields": [{"name": "bytes","type": "BYTES"},'
     '{"name": "date","type": "DATE"},{"name": "time","type": "TIME"}]}')
-NEW_TYPES_OUTPUT_VERIFY_QUERY = ('SELECT date FROM [%s];')
+NEW_TYPES_OUTPUT_VERIFY_QUERY = ('SELECT date FROM `%s`;')
 # There are problems with query time and bytes with current version of 
bigquery.
 NEW_TYPES_OUTPUT_EXPECTED = [
     (datetime.date(2000, 1, 1),),
@@ -61,7 +61,7 @@
 NEW_TYPES_QUERY = (
     'SELECT bytes, date, time FROM [%s.%s]')
 DIALECT_OUTPUT_SCHEMA = ('{"fields": [{"name": "fruit","type": "STRING"}]}')
-DIALECT_OUTPUT_VERIFY_QUERY = ('SELECT fruit from [%s];')
+DIALECT_OUTPUT_VERIFY_QUERY = ('SELECT fruit from `%s`;')
 DIALECT_OUTPUT_EXPECTED = [(u'apple',), (u'orange',)]
 
 
diff --git 
a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py 
b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py
index 65a4941f59a..1e549c67630 100644
--- a/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py
+++ b/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py
@@ -62,7 +62,7 @@ def run_bq_pipeline(argv=None):
            known_args.output,
            schema=table_schema,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
-           write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))
+           write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY)))
 
   result = p.run()
   result.wait_until_finish()
diff --git a/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py 
b/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py
index 8851a143971..72e1bb7fc8b 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery_io_read_it_test.py
@@ -54,7 +54,7 @@ def run_bigquery_io_read_pipeline(self, input_size):
         **extra_opts))
 
   @attr('IT')
-  def bigquery_read_1M_python(self):
+  def test_bigquery_read_1M_python(self):
     self.run_bigquery_io_read_pipeline('1M')
 
 
diff --git a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py 
b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py
index c33f0db4998..c3069a359cc 100644
--- a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py
+++ b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py
@@ -94,19 +94,8 @@ def _matches(self, _):
       retry_filter=retry_on_http_and_value_error)
   def _query_with_retry(self, bigquery_client):
     """Run Bigquery query with retry if got error http response"""
-    query = bigquery_client.run_sync_query(self.query)
-    query.run()
-
-    # Fetch query data one page at a time.
-    page_token = None
-    results = []
-    while True:
-      for row in query.fetch_data(page_token=page_token):
-        results.append(row)
-      if results:
-        break
-
-    return results
+    query_job = bigquery_client.query(self.query)
+    return [row.values() for row in query_job]
 
   def describe_to(self, description):
     description \
diff --git a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py 
b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py
index e6ae9a06dc8..f2714623b23 100644
--- a/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py
+++ b/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher_test.py
@@ -22,9 +22,8 @@
 import logging
 import unittest
 
+import mock
 from hamcrest import assert_that as hc_assert_that
-from mock import Mock
-from mock import patch
 
 from apache_beam.io.gcp.tests import bigquery_matcher as bq_verifier
 from apache_beam.testing.test_utils import patch_retry
@@ -32,26 +31,30 @@
 # Protect against environments where bigquery library is not available.
 # pylint: disable=wrong-import-order, wrong-import-position
 try:
+  # TODO: fix usage
   from google.cloud import bigquery
   from google.cloud.exceptions import NotFound
 except ImportError:
   bigquery = None
+  NotFound = None
 # pylint: enable=wrong-import-order, wrong-import-position
 
 
 @unittest.skipIf(bigquery is None, 'Bigquery dependencies are not installed.')
+@mock.patch.object(bigquery, 'Client')
 class BigqueryMatcherTest(unittest.TestCase):
 
   def setUp(self):
-    self._mock_result = Mock()
+    self._mock_result = mock.Mock()
     patch_retry(self, bq_verifier)
 
-  @patch.object(bigquery, 'Client')
   def test_bigquery_matcher_success(self, mock_bigquery):
-    mock_query = Mock()
-    mock_client = mock_bigquery.return_value
-    mock_client.run_sync_query.return_value = mock_query
-    mock_query.fetch_data.return_value = ([], None, None)
+    mock_query_result = [mock.Mock(), mock.Mock(), mock.Mock()]
+    mock_query_result[0].values.return_value = []
+    mock_query_result[1].values.return_value = None
+    mock_query_result[2].values.return_value = None
+
+    mock_bigquery.return_value.query.return_value = mock_query_result
 
     matcher = bq_verifier.BigqueryMatcher(
         'mock_project',
@@ -59,51 +62,16 @@ def test_bigquery_matcher_success(self, mock_bigquery):
         '59f9d6bdee30d67ea73b8aded121c3a0280f9cd8')
     hc_assert_that(self._mock_result, matcher)
 
-  @patch.object(bigquery, 'Client')
-  def test_bigquery_matcher_query_run_error(self, mock_bigquery):
-    mock_query = Mock()
-    mock_client = mock_bigquery.return_value
-    mock_client.run_sync_query.return_value = mock_query
-    mock_query.run.side_effect = ValueError('job is already running')
-
-    matcher = bq_verifier.BigqueryMatcher('mock_project',
-                                          'mock_query',
-                                          'mock_checksum')
-    with self.assertRaises(ValueError):
-      hc_assert_that(self._mock_result, matcher)
-    self.assertTrue(mock_query.run.called)
-    self.assertEqual(bq_verifier.MAX_RETRIES + 1, mock_query.run.call_count)
-
-  @patch.object(bigquery, 'Client')
-  def test_bigquery_matcher_fetch_data_error(self, mock_bigquery):
-    mock_query = Mock()
-    mock_client = mock_bigquery.return_value
-    mock_client.run_sync_query.return_value = mock_query
-    mock_query.fetch_data.side_effect = ValueError('query job not executed')
-
-    matcher = bq_verifier.BigqueryMatcher('mock_project',
-                                          'mock_query',
-                                          'mock_checksum')
-    with self.assertRaises(ValueError):
-      hc_assert_that(self._mock_result, matcher)
-    self.assertTrue(mock_query.fetch_data.called)
-    self.assertEqual(bq_verifier.MAX_RETRIES + 1,
-                     mock_query.fetch_data.call_count)
-
-  @patch.object(bigquery, 'Client')
-  def test_bigquery_matcher_query_responds_error_code(self, mock_bigquery):
-    mock_query = Mock()
-    mock_client = mock_bigquery.return_value
-    mock_client.run_sync_query.return_value = mock_query
-    mock_query.run.side_effect = NotFound('table is not found')
+  def test_bigquery_matcher_query_error_retry(self, mock_bigquery):
+    mock_query = mock_bigquery.return_value.query
+    mock_query.side_effect = NotFound('table not found')
 
     matcher = bq_verifier.BigqueryMatcher('mock_project',
                                           'mock_query',
                                           'mock_checksum')
     with self.assertRaises(NotFound):
       hc_assert_that(self._mock_result, matcher)
-    self.assertTrue(mock_query.run.called)
-    self.assertEqual(bq_verifier.MAX_RETRIES + 1, mock_query.run.call_count)
+    self.assertEqual(bq_verifier.MAX_RETRIES + 1, mock_query.call_count)
 
 
 if __name__ == '__main__':
diff --git a/sdks/python/apache_beam/io/gcp/tests/utils.py 
b/sdks/python/apache_beam/io/gcp/tests/utils.py
index 81fc4736c04..60987f1f806 100644
--- a/sdks/python/apache_beam/io/gcp/tests/utils.py
+++ b/sdks/python/apache_beam/io/gcp/tests/utils.py
@@ -21,14 +21,17 @@
 from __future__ import absolute_import
 
 import logging
+import time
 
 from apache_beam.utils import retry
 
 # Protect against environments where bigquery library is not available.
 try:
   from google.cloud import bigquery
+  from google.cloud.exceptions import NotFound
 except ImportError:
   bigquery = None
+  NotFound = None
 
 
 class GcpTestIOError(retry.PermanentException):
@@ -40,26 +43,53 @@ class GcpTestIOError(retry.PermanentException):
 @retry.with_exponential_backoff(
     num_retries=3,
     retry_filter=retry.retry_on_server_errors_filter)
-def delete_bq_table(project, dataset, table):
-  """Delete a Biqquery table.
+def create_bq_dataset(project, dataset_base_name):
+  """Creates an empty BigQuery dataset.
+
+  Args:
+    project: Project to work in.
+    dataset_base_name: Prefix for dataset id.
+
+  Returns:
+    A ``google.cloud.bigquery.dataset.DatasetReference`` object pointing to the
+    new dataset.
+  """
+  client = bigquery.Client(project=project)
+  unique_dataset_name = dataset_base_name + str(int(time.time()))
+  dataset_ref = client.dataset(unique_dataset_name, project=project)
+  dataset = bigquery.Dataset(dataset_ref)
+  client.create_dataset(dataset)
+  return dataset_ref
+
+
+@retry.with_exponential_backoff(
+    num_retries=3,
+    retry_filter=retry.retry_on_server_errors_filter)
+def delete_bq_dataset(project, dataset_ref):
+  """Deletes a BigQuery dataset and its contents.
+
+  Args:
+    project: Project to work in.
+    dataset_ref: A ``google.cloud.bigquery.dataset.DatasetReference`` object
+      pointing to the dataset to delete.
+  """
+  client = bigquery.Client(project=project)
+  client.delete_dataset(dataset_ref, delete_contents=True)
+
+
+def delete_bq_table(project, dataset_id, table_id):
+  """Delete a BiqQuery table.
 
   Args:
     project: Name of the project.
-    dataset: Name of the dataset where table is.
-    table:   Name of the table.
+    dataset_id: Name of the dataset where table is.
+    table_id: Name of the table.
   """
-  logging.info('Clean up a Bigquery table with project: %s, dataset: %s, '
-               'table: %s.', project, dataset, table)
-  bq_dataset = bigquery.Client(project=project).dataset(dataset)
-  if not bq_dataset.exists():
-    raise GcpTestIOError('Failed to cleanup. Bigquery dataset %s doesn\'t '
-                         'exist in project %s.' % (dataset, project))
-  bq_table = bq_dataset.table(table)
-  if not bq_table.exists():
-    raise GcpTestIOError('Failed to cleanup. Bigquery table %s doesn\'t '
-                         'exist in project %s, dataset %s.' %
-                         (table, project, dataset))
-  bq_table.delete()
-  if bq_table.exists():
-    raise RuntimeError('Failed to cleanup. Bigquery table %s still exists '
-                       'after cleanup.' % table)
+  logging.info('Clean up a BigQuery table with project: %s, dataset: %s, '
+               'table: %s.', project, dataset_id, table_id)
+  client = bigquery.Client(project=project)
+  table_ref = client.dataset(dataset_id).table(table_id)
+  try:
+    client.delete_table(table_ref)
+  except NotFound:
+    raise GcpTestIOError('BigQuery table does not exist: %s' % table_ref)
diff --git a/sdks/python/apache_beam/io/gcp/tests/utils_test.py 
b/sdks/python/apache_beam/io/gcp/tests/utils_test.py
index 4ea65a9d86b..8af749740b9 100644
--- a/sdks/python/apache_beam/io/gcp/tests/utils_test.py
+++ b/sdks/python/apache_beam/io/gcp/tests/utils_test.py
@@ -22,8 +22,7 @@
 import logging
 import unittest
 
-from mock import Mock
-from mock import patch
+import mock
 
 from apache_beam.io.gcp.tests import utils
 from apache_beam.testing.test_utils import patch_retry
@@ -31,74 +30,47 @@
 # Protect against environments where bigquery library is not available.
 try:
   from google.cloud import bigquery
+  from google.cloud.exceptions import NotFound
 except ImportError:
   bigquery = None
+  NotFound = None
 
 
 @unittest.skipIf(bigquery is None, 'Bigquery dependencies are not installed.')
+@mock.patch.object(bigquery, 'Client')
 class UtilsTest(unittest.TestCase):
 
   def setUp(self):
-    self._mock_result = Mock()
     patch_retry(self, utils)
 
-  @patch.object(bigquery, 'Client')
-  def test_delete_table_succeeds(self, mock_client):
-    mock_dataset = Mock()
-    mock_client.return_value.dataset = mock_dataset
-    mock_dataset.return_value.exists.return_value = True
+  @mock.patch.object(bigquery, 'Dataset')
+  def test_create_bq_dataset(self, mock_dataset, mock_client):
+    mock_client.dataset.return_value = 'dataset_ref'
+    mock_dataset.return_value = 'dataset_obj'
+
+    utils.create_bq_dataset('project', 'dataset_base_name')
+    mock_client.return_value.create_dataset.assert_called_with('dataset_obj')
 
-    mock_table = Mock()
-    mock_dataset.return_value.table = mock_table
-    mock_table.return_value.exists.side_effect = [True, False]
+  def test_delete_bq_dataset(self, mock_client):
+    utils.delete_bq_dataset('project', 'dataset_ref')
+    mock_client.return_value.delete_dataset.assert_called_with(
+        'dataset_ref', delete_contents=mock.ANY)
+
+  def test_delete_table_succeeds(self, mock_client):
+    mock_client.return_value.dataset.return_value.table.return_value = (
+        'table_ref')
 
     utils.delete_bq_table('unused_project',
                           'unused_dataset',
                           'unused_table')
+    mock_client.return_value.delete_table.assert_called_with('table_ref')
 
-  @patch.object(bigquery, 'Client')
-  def test_delete_table_fails_dataset_not_exist(self, mock_client):
-    mock_dataset = Mock()
-    mock_client.return_value.dataset = mock_dataset
-    mock_dataset.return_value.exists.return_value = False
-
-    with self.assertRaisesRegexp(
-        Exception, r'^Failed to cleanup. Bigquery dataset unused_dataset '
-                   r'doesn\'t exist'):
-      utils.delete_bq_table('unused_project',
-                            'unused_dataset',
-                            'unused_table')
-
-  @patch.object(bigquery, 'Client')
-  def test_delete_table_fails_table_not_exist(self, mock_client):
-    mock_dataset = Mock()
-    mock_client.return_value.dataset = mock_dataset
-    mock_dataset.return_value.exists.return_value = True
-
-    mock_table = Mock()
-    mock_dataset.return_value.table = mock_table
-    mock_table.return_value.exists.return_value = False
-
-    with self.assertRaisesRegexp(Exception,
-                                 r'^Failed to cleanup. Bigquery table '
-                                 'unused_table doesn\'t exist'):
-      utils.delete_bq_table('unused_project',
-                            'unused_dataset',
-                            'unused_table')
-
-  @patch.object(bigquery, 'Client')
-  def test_delete_table_fails_service_error(self, mock_client):
-    mock_dataset = Mock()
-    mock_client.return_value.dataset = mock_dataset
-    mock_dataset.return_value.exists.return_value = True
-
-    mock_table = Mock()
-    mock_dataset.return_value.table = mock_table
-    mock_table.return_value.exists.return_value = True
+  def test_delete_table_fails_not_found(self, mock_client):
+    mock_client.return_value.dataset.return_value.table.return_value = (
+        'table_ref')
+    mock_client.return_value.delete_table.side_effect = NotFound('test')
 
-    with self.assertRaisesRegexp(Exception,
-                                 r'^Failed to cleanup. Bigquery table '
-                                 'unused_table still exists'):
+    with self.assertRaisesRegexp(Exception, r'does not exist:.*table_ref'):
       utils.delete_bq_table('unused_project',
                             'unused_dataset',
                             'unused_table')
diff --git a/sdks/python/container/base_image_requirements.txt 
b/sdks/python/container/base_image_requirements.txt
index 8e352a664df..162fb932784 100644
--- a/sdks/python/container/base_image_requirements.txt
+++ b/sdks/python/container/base_image_requirements.txt
@@ -48,7 +48,7 @@ nose==1.3.7
 google-apitools==0.5.20
 googledatastore==7.0.1
 google-cloud-pubsub==0.35.4
-google-cloud-bigquery==0.25.0
+google-cloud-bigquery==1.6.0
 proto-google-cloud-datastore-v1==0.90.4
 
 # Optional packages
diff --git a/sdks/python/scripts/run_postcommit.sh 
b/sdks/python/scripts/run_postcommit.sh
index be133769c52..2bd1ca41889 100755
--- a/sdks/python/scripts/run_postcommit.sh
+++ b/sdks/python/scripts/run_postcommit.sh
@@ -105,7 +105,8 @@ 
apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest"
     TESTS="--tests=\
 apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it,\
 apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest,\
-apache_beam.io.gcp.big_query_query_to_table_it_test:BigQueryQueryToTableIT"
+apache_beam.io.gcp.big_query_query_to_table_it_test:BigQueryQueryToTableIT,\
+apache_beam.io.gcp.bigquery_io_read_it_test"
   fi
 fi
 
diff --git a/sdks/python/setup.py b/sdks/python/setup.py
index a3db7903fca..4d5e227163c 100644
--- a/sdks/python/setup.py
+++ b/sdks/python/setup.py
@@ -143,7 +143,7 @@ def get_version():
     'googledatastore==7.0.1; python_version < "3.0"',
     'google-cloud-pubsub==0.35.4',
     # GCP packages required by tests
-    'google-cloud-bigquery==0.25.0',
+    'google-cloud-bigquery>=1.6.0,<1.7.0',
 ]
 
 if sys.version_info[0] == 2:


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 157726)
    Time Spent: 1h  (was: 50m)

> Beam Dependency Update Request: google-cloud-bigquery
> -----------------------------------------------------
>
>                 Key: BEAM-5537
>                 URL: https://issues.apache.org/jira/browse/BEAM-5537
>             Project: Beam
>          Issue Type: Bug
>          Components: dependencies
>            Reporter: Beam JIRA Bot
>            Assignee: Udi Meiri
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
>  ------------------------- 2018-10-01 19:15:02.343276 
> -------------------------
>         Please consider upgrading the dependency google-cloud-bigquery. 
>         The current version is 0.25.0. The latest version is 1.5.1 
>         cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  ------------------------- 2018-10-08 12:08:29.646271 
> -------------------------
>         Please consider upgrading the dependency google-cloud-bigquery. 
>         The current version is 0.25.0. The latest version is 1.6.0 
>         cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  ------------------------- 2018-10-15 12:09:25.995486 
> -------------------------
>         Please consider upgrading the dependency google-cloud-bigquery. 
>         The current version is 0.25.0. The latest version is 1.6.0 
>         cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  ------------------------- 2018-10-22 12:09:52.889923 
> -------------------------
>         Please consider upgrading the dependency google-cloud-bigquery. 
>         The current version is 0.25.0. The latest version is 1.6.0 
>         cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery

Reply via email to