[GitHub] [beam] urquha opened a new issue, #25531: [Bug]: Write to bigquery python STK

via GitHub Fri, 17 Feb 2023 07:29:31 -0800


urquha opened a new issue, #25531:
URL: https://github.com/apache/beam/issues/25531


   ### What happened?
   
   I am trying to write to big query using the python SDK. I passed my table 
reference into WriteToBigQuery as project:dataset.Tablename
   
   I get this error
   `beam.py 162 <module>
   parsed_results | beam.io.WriteToBigQuery(
   
   bigquery.py 1934 __init__
   self.table_reference = bigquery_tools.parse_table_reference(
   
   bigquery_tools.py 244 parse_table_reference
   if isinstance(table, TableReference):
   
   TypeError:
   isinstance() arg 2 must be a type, a tuple of types, or a union`
   
   I looked in the sdk and I can't find where TableReference is defined, on 
line 87 of [bigquery 
tools](https://github.com/apache/beam/blob/6adecd438790d8c1b5182043db16232b68ff7a98/sdks/python/apache_beam/io/gcp/bigquery_tools.py),
 there is a line which imports the table reference but when I run the command 
myself it gives me an error `from apache_beam.io.gcp.internal.clients.bigquery 
import TableReference
   *** ImportError: cannot import name 'TableReference' from 
'apache_beam.io.gcp.internal.clients.bigquery' 
(/opt/homebrew/lib/python3.10/site-packages/apache_beam/io/gcp/internal/clients/bigquery/__init__.py)`
   
   I believe this leads to the isinstance having a none as the table reference, 
which is breaking my code.
   I have copied code from a few different places which supposedly would work 
and it all gives the same error, I really hope that I'm doing something dumb.
   
   I even tried copying the 
[test](https://github.com/apache/beam/blob/6adecd438790d8c1b5182043db16232b68ff7a98/sdks/python/apache_beam/io/gcp/bigquery_tools_test.py)
 by importing and using the bigquery client:
   `from apache_beam.io.gcp.internal.clients import bigquery
   bigquery.TableReference()
   *** AttributeError: module 'apache_beam.io.gcp.internal.clients.bigquery' 
has no attribute 'TableReference'
   `
   
   Also, when I try to use the code from the docs there is a bracket missing 
from the table names variable and it uses the Create method twice which is not 
allowed and returns an error
   
   `with Pipeline() as p:
     elements = (p | beam.Create([
       {'type': 'error', 'timestamp': '12:34:56', 'message': 'bad'},
       {'type': 'user_log', 'timestamp': '12:34:59', 'query': 'flu symptom'},
     ]))
   
     table_names = (p | beam.Create([
       ('error', 'my_project:dataset1.error_table_for_today'),
       ('user_log', 'my_project:dataset1.query_table_for_today'),
     ]) <------
   
     table_names_dict = beam.pvalue.AsDict(table_names)
   
     elements | beam.io.gcp.bigquery.WriteToBigQuery(
       table=lambda row, table_dict: table_dict[row['type']],
       table_side_inputs=(table_names_dict,))`
   
   
   
   ### Issue Priority
   
   Priority: 1 (data loss / total loss of function)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] urquha opened a new issue, #25531: [Bug]: Write to bigquery python STK

Reply via email to