[GitHub] [beam] ajamato commented on a change in pull request #13017: [BEAM-11018] Use metric for Python BigQuery streaming insert API late…

GitBox Fri, 09 Oct 2020 07:00:36 -0700


ajamato commented on a change in pull request #13017:
URL: https://github.com/apache/beam/pull/13017#discussion_r501954641




##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -271,6 +276,11 @@ def __init__(self, client=None):
     # randomized prefix for row IDs.
     self._row_id_prefix = '' if client else uuid.uuid4()
     self._temporary_table_suffix = uuid.uuid4().hex
+    self._latency_histogram_metric = Metrics.histogram(

Review comment:
       We do plan on building a few system metrics for GCP IO API request call 
latency which use histograms. Though, it doesn't need to use the user metric 
framework itself, its a reasonable way to implement it to use a lot of the same 
primitives.
   
   I asked @ihji to use the metric framework for the logger, as I will be 
adding a latency histogram metrics for these BigQuery API request calls.

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -271,6 +276,11 @@ def __init__(self, client=None):
     # randomized prefix for row IDs.
     self._row_id_prefix = '' if client else uuid.uuid4()
     self._temporary_table_suffix = uuid.uuid4().hex
+    self._latency_histogram_metric = Metrics.histogram(

Review comment:
       We do plan on building a few system metrics for GCP IO API request call 
latency which use histograms. Though, it doesn't need to use the user metric 
framework itself, its a reasonable way to implement it to use a lot of the same 
primitives.
   
   I asked @ihji to use the metric framework for the logger, as I will be 
adding a latency histogram metrics for these BigQuery API request calls. The 
previous PR has written basically a side framework to count the API call 
latencies into a histogram and log them. But since I will be introducing a 
metric anyways, I suggested using this approach.

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -271,6 +276,11 @@ def __init__(self, client=None):
     # randomized prefix for row IDs.
     self._row_id_prefix = '' if client else uuid.uuid4()
     self._temporary_table_suffix = uuid.uuid4().hex
+    self._latency_histogram_metric = Metrics.histogram(

Review comment:
       We do plan on building a few system metrics for GCP IO API request call 
latency which use histograms. Though, it doesn't need to use the user metric 
framework itself, its a reasonable way to implement it to use a lot of the same 
primitives.
   
   I asked @ihji to use the metric framework for the logger, as I will be 
adding a latency histogram metrics for these BigQuery API request calls. The 
previous PR has written basically a side framework to count the API call 
latencies into a histogram and log them. But since I will be introducing a 
metric anyways, I suggested using this approach. So we wouldn't essentially 
have two code paths to count the metric for logging, and for the system metric.

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -271,6 +276,11 @@ def __init__(self, client=None):
     # randomized prefix for row IDs.
     self._row_id_prefix = '' if client else uuid.uuid4()
     self._temporary_table_suffix = uuid.uuid4().hex
+    self._latency_histogram_metric = Metrics.histogram(

Review comment:
       We do plan on building a few system metrics for GCP IO API request call 
latency which use histograms. Though, it doesn't need to use the user metric 
framework itself, its a reasonable way to implement it to use a lot of the same 
primitives.
   
   I asked @ihji to use the metric framework for the logger, as I will be 
adding a latency histogram metrics for these BigQuery API request calls. The 
previous PR has written basically a side framework to count the API call 
latencies into a histogram and log them. But since I will be introducing a 
metric anyways, I suggested using this approach. So we wouldn't essentially 
have two code paths to count the latency values for logging, and for the system 
metric.

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -271,6 +276,11 @@ def __init__(self, client=None):
     # randomized prefix for row IDs.
     self._row_id_prefix = '' if client else uuid.uuid4()
     self._temporary_table_suffix = uuid.uuid4().hex
+    self._latency_histogram_metric = Metrics.histogram(

Review comment:
       We do plan on building a few system metrics for GCP IO API request call 
latency which use histograms. Though, it doesn't need to use the user metric 
framework itself, its a reasonable way to implement it to use a lot of the same 
techniques.
   
   I asked @ihji to use the metric framework for the logger, as I will be 
adding a latency histogram metrics for these BigQuery API request calls. The 
previous PR has written basically a side framework to count the API call 
latencies into a histogram and log them. But since I will be introducing a 
metric anyways, I suggested using this approach. So we wouldn't essentially 
have two code paths to count the latency values for logging, and for the system 
metric.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] ajamato commented on a change in pull request #13017: [BEAM-11018] Use metric for Python BigQuery streaming insert API late…

Reply via email to