pabloem commented on a change in pull request #15489:
URL: https://github.com/apache/beam/pull/15489#discussion_r746099477



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -2158,7 +2169,7 @@ def expand(self, pcoll):
           schema=self.schema,
           create_disposition=self.create_disposition,
           write_disposition=self.write_disposition,
-          triggering_frequency=self.triggering_frequency,
+          triggering_frequency=int(self.triggering_frequency),

Review comment:
       in this case, triggering_frequency does not have to be an integer, 
right? if we want to wait <1s (e.g. default is .2s, right?)

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
##########
@@ -777,6 +777,21 @@ def test_to_from_runner_api(self):
     self.assertEqual(
         original_side_input_data.view_fn, deserialized_side_input_data.view_fn)
 
+  def test_triggering_frequency_with_streaming_inserts_usage(self):
+    # triggering_frequency with STREAMING_INSERTS can only be
+    # used with with_auto_sharding=True
+    with self.assertRaises(ValueError):
+      beam.io.gcp.bigquery.WriteToBigQuery(
+          "dataset.table",
+          method=WriteToBigQuery.Method.STREAMING_INSERTS,
+          triggering_frequency=0.5,
+          with_auto_sharding=False)
+      beam.io.gcp.bigquery.WriteToBigQuery(
+          "dataset.table",
+          method=WriteToBigQuery.Method.STREAMING_INSERTS,
+          triggering_frequency=0.5,
+          with_auto_sharding=False)

Review comment:
       these lines are repeated?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
##########
@@ -1015,6 +1031,7 @@ def store_callback(table, **kwargs):
               # Set a batch size such that the input elements will be inserted
               # in 2 batches.
               batch_size=2,
+              triggering_frequency=None,

Review comment:
       can we make sure that users are not forced to define a 
triggering_frequency for their pipelines unless they're using file loads in 
streaming?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -2016,14 +2019,21 @@ def __init__(
         passed to the table callable (if one is provided).
       schema_side_inputs: A tuple with ``AsSideInput`` PCollections to be
         passed to the schema callable (if one is provided).
-      triggering_frequency (int): Every triggering_frequency duration, a
-        BigQuery load job will be triggered for all the data written since
-        the last load job. BigQuery has limits on how many load jobs can be
+      triggering_frequency (float):
+        When method is FILE_LOADS:
+        Value will be converted to int. Every triggering_frequency seconds, a
+        BigQuery load job will be triggered for all the data written since the
+        last load job. BigQuery has limits on how many load jobs can be
         triggered per day, so be careful not to set this duration too low, or
         you may exceed daily quota. Often this is set to 5 or 10 minutes to
-        ensure that the project stays well under the BigQuery quota.
-        See https://cloud.google.com/bigquery/quota-policy for more information
+        ensure that the project stays well under the BigQuery quota. See
+        https://cloud.google.com/bigquery/quota-policy for more information
         about BigQuery quotas.
+
+        When method is STREAMING_INSERTS and with_auto_sharding=True:
+        A streaming inserts batch will be submitted at least every
+        triggering_frequency seconds when data is waiting. The batch can be
+        sent earlier if it reaches the maximum batch size set by batch_size.

Review comment:
       Can you mention what's the default value for this property?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to