[ 
https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pesach Weinstock updated BEAM-6831:
-----------------------------------
    Description: 
Right now, there is a potential issue with the python sdk where 
{{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
than needed:

[https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]

The above request falls under specific bigquery API quotas which are excluded 
from bigquery streaming inserts. When used in a streaming pipeline, we hit this 
quota pretty quickly, and cannot proceed to write any further data to bigquery.

Dispositions being used are:
 * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
 * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}

This is currently blocking us from using bigqueryIO in a streaming pipeline to 
write to bigquery, and required us to formally request an API quota increase 
from Google to temporarily correct the situation.

Our pipeline uses DataflowRunner. Error seen is below, and in attached 
screenshot of stackdriver trace.
{code:java}
  "errors": [
    {
      "message": "Exceeded rate limits: too many api requests per user per 
method for this user_method. For more information, see 
https://cloud.google.com/bigquery/troubleshooting-errors";,
      "domain": "usageLimits",
      "reason": "rateLimitExceeded"
    }
  ],
{code}

  was:
Right now, there is a potential issue with the python sdk where 
{{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
than needed:

[https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]

The above request falls under specific bigquery API quotas which are excluded 
from bigquery streaming inserts. When used in a streaming pipeline, we hit this 
quota pretty quickly, and cannot proceed to write any further data to bigquery.

Dispositions being used are:
 * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
 * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}

This is currently blocking us from using bigqueryIO in a streaming pipeline to 
write to bigquery, and required us to formally request an API quota increase 
from Google to temporarily correct the situation.

Our pipeline uses DataflowRunner. I am unable to attach screenshots to this 
JIRA, but the following message is received in logs:
{code:java}
"error": {
  "code": 403,
  "message": "Exceeded rate limits: too many api requests per user per method 
for this user_method. For more information, see 
https://cloud.google.com/bigquery/troubleshooting-errors";,
  "errors": [
    {
      "message": "Exceeded rate limits: too many api requests per user per 
method for this user_method. For more information, see 
https://cloud.google.com/bigquery/troubleshooting-errors";,
      "domain": "usageLimits",
      "reason": "rateLimitExceeded"
    }
  ],
  "status": "PERMISSION_DENIED"
}{code}
 


> python sdk WriteToBigQuery excessive usage of metered API
> ---------------------------------------------------------
>
>                 Key: BEAM-6831
>                 URL: https://issues.apache.org/jira/browse/BEAM-6831
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.10.0
>            Reporter: Pesach Weinstock
>            Priority: Major
>         Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where 
> {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
> than needed:
> [https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded 
> from bigquery streaming inserts. When used in a streaming pipeline, we hit 
> this quota pretty quickly, and cannot proceed to write any further data to 
> bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline 
> to write to bigquery, and required us to formally request an API quota 
> increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached 
> screenshot of stackdriver trace.
> {code:java}
>   "errors": [
>     {
>       "message": "Exceeded rate limits: too many api requests per user per 
> method for this user_method. For more information, see 
> https://cloud.google.com/bigquery/troubleshooting-errors";,
>       "domain": "usageLimits",
>       "reason": "rateLimitExceeded"
>     }
>   ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to