[ 
https://issues.apache.org/jira/browse/BEAM-6291?focusedWorklogId=190305&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-190305
 ]

ASF GitHub Bot logged work on BEAM-6291:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jan/19 02:47
            Start Date: 26/Jan/19 02:47
    Worklog Time Spent: 10m 
      Work Description: udim commented on pull request #7614: [BEAM-6291] 
Generic BigQuery schema load tests metrics
URL: https://github.com/apache/beam/pull/7614#discussion_r251167220
 
 

 ##########
 File path: 
sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
 ##########
 @@ -104,25 +145,23 @@ def _get_or_create_table(self, bq_schemas, dataset):
       table = bigquery.Table(table_ref, schema=bq_schemas)
       self._bq_table = self._bq_client.create_table(table)
 
-  def _parse_schema(self, schema_map):
-    return [{'name': SUBMIT_TIMESTAMP_LABEL,
-             'type': 'TIMESTAMP',
-             'mode': 'REQUIRED'}] + schema_map
-
-  def _prepare_schema(self, schemas):
-    return [_get_schema_field(schema) for schema in schemas]
+  def _prepare_schema(self):
+    return [get_schema_field(row) for row in SCHEMA]
 
 Review comment:
   I think this could be simplified, if you rename the key `type` to 
`field_type` in `SCHEMA`:
   ```py
   SCHEMA = [
       {'name': ID_LABEL,
        'field_type': 'STRING',
        'mode': 'REQUIRED'
       },
   ....
   ```
   
   then this line could be simplified to:
   ```py
   return [SchemaField(**row) for row in SCHEMA]
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 190305)
    Time Spent: 40m  (was: 0.5h)

> Make the schema for BQ tables storing metric results more generic (Python)
> --------------------------------------------------------------------------
>
>                 Key: BEAM-6291
>                 URL: https://issues.apache.org/jira/browse/BEAM-6291
>             Project: Beam
>          Issue Type: Sub-task
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Assignee: Kasia Kucharczyk
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we keep the metrics results in BQ in tables with a schema like 
> this: 
> timestamp | total_bytes | run_time | (possibly other BQ columns)
> every time we want to add a new column the schema has to be extended. This is 
> not convenient given the fact that any load test can have different metrics 
> stored. This in turn would cause multiple BQ tables each queried differently. 
> We can provide a more generic schema, like so: 
> test_id | timestamp | metric | value
> thanks to that, every metric, whatever it's name is, can be saved in the 
> table as a separate row. This gives more elasticity in storing metrics and is 
> still easy to query and plot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to