[ https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050880#comment-17050880 ]
Keiji Yoshida commented on BEAM-6831: ------------------------------------- That is because the bigquery.tables.get API is called every time a bundle in a PCollection is processed in Apache Beam 2.10.0 ([code|https://github.com/apache/beam/blob/v2.10.0/sdks/python/apache_beam/io/gcp/bigquery.py#L1365-L1367]). In the latest version of Apache Beam (2.19.0), the bigquery.tables.get API is not called as long as `create_disposition` is set to `CREATE_NEVER` ([code|https://github.com/apache/beam/blob/v2.19.0/sdks/python/apache_beam/io/gcp/bigquery.py#L989-L1009]). So, you can avoid the rate limit error by using Apache Beam 2.19.0 and setting `create_disposition` to `CREATE_NEVER`. > python sdk WriteToBigQuery excessive usage of metered API > --------------------------------------------------------- > > Key: BEAM-6831 > URL: https://issues.apache.org/jira/browse/BEAM-6831 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.10.0 > Reporter: Pesach Weinstock > Assignee: Pablo Estrada > Priority: Major > Labels: bigquery, dataflow, gcp, python > Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png > > > Right now, there is a potential issue with the python sdk where > {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often > than needed: > [https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json] > The above request falls under specific bigquery API quotas which are excluded > from bigquery streaming inserts. When used in a streaming pipeline, we hit > this quota pretty quickly, and cannot proceed to write any further data to > bigquery. > Dispositions being used are: > * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}} > * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}} > This is currently blocking us from using bigqueryIO in a streaming pipeline > to write to bigquery, and required us to formally request an API quota > increase from Google to temporarily correct the situation. > Our pipeline uses DataflowRunner. Error seen is below, and in attached > screenshot of stackdriver trace. > {code:java} > "errors": [ > { > "message": "Exceeded rate limits: too many api requests per user per > method for this user_method. For more information, see > https://cloud.google.com/bigquery/troubleshooting-errors", > "domain": "usageLimits", > "reason": "rateLimitExceeded" > } > ], > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)