[jira] [Commented] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

2020-06-10 Thread Beam JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17131411#comment-17131411
 ] 

Beam JIRA Bot commented on BEAM-6831:
-

This issue was marked "stale-assigned" and has not received a public comment in 
7 days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> python sdk WriteToBigQuery excessive usage of metered API
> -
>
> Key: BEAM-6831
> URL: https://issues.apache.org/jira/browse/BEAM-6831
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Pesach Weinstock
>Priority: P2
>  Labels: bigquery, dataflow, gcp, python
> Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where 
> {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
> than needed:
> [https://www.googleapis.com/bigquery/v2/projects//datasets//tables/?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded 
> from bigquery streaming inserts. When used in a streaming pipeline, we hit 
> this quota pretty quickly, and cannot proceed to write any further data to 
> bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline 
> to write to bigquery, and required us to formally request an API quota 
> increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached 
> screenshot of stackdriver trace.
> {code:java}
>   "errors": [
> {
>   "message": "Exceeded rate limits: too many api requests per user per 
> method for this user_method. For more information, see 
> https://cloud.google.com/bigquery/troubleshooting-errors;,
>   "domain": "usageLimits",
>   "reason": "rateLimitExceeded"
> }
>   ],
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17122092#comment-17122092
 ] 

Kenneth Knowles commented on BEAM-6831:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> python sdk WriteToBigQuery excessive usage of metered API
> -
>
> Key: BEAM-6831
> URL: https://issues.apache.org/jira/browse/BEAM-6831
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Pesach Weinstock
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: bigquery, dataflow, gcp, python, stale-assigned
> Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where 
> {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
> than needed:
> [https://www.googleapis.com/bigquery/v2/projects//datasets//tables/?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded 
> from bigquery streaming inserts. When used in a streaming pipeline, we hit 
> this quota pretty quickly, and cannot proceed to write any further data to 
> bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline 
> to write to bigquery, and required us to formally request an API quota 
> increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached 
> screenshot of stackdriver trace.
> {code:java}
>   "errors": [
> {
>   "message": "Exceeded rate limits: too many api requests per user per 
> method for this user_method. For more information, see 
> https://cloud.google.com/bigquery/troubleshooting-errors;,
>   "domain": "usageLimits",
>   "reason": "rateLimitExceeded"
> }
>   ],
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

2020-03-03 Thread Keiji Yoshida (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050880#comment-17050880
 ] 

Keiji Yoshida commented on BEAM-6831:
-

That is because the bigquery.tables.get API is called every time a bundle in a 
PCollection is processed in Apache Beam 2.10.0 
([code|https://github.com/apache/beam/blob/v2.10.0/sdks/python/apache_beam/io/gcp/bigquery.py#L1365-L1367]).

In the latest version of Apache Beam (2.19.0), the bigquery.tables.get API is 
not called as long as `create_disposition` is set to `CREATE_NEVER` 
([code|https://github.com/apache/beam/blob/v2.19.0/sdks/python/apache_beam/io/gcp/bigquery.py#L989-L1009]).
 So, you can avoid the rate limit error by using Apache Beam 2.19.0 and setting 
`create_disposition` to `CREATE_NEVER`.

> python sdk WriteToBigQuery excessive usage of metered API
> -
>
> Key: BEAM-6831
> URL: https://issues.apache.org/jira/browse/BEAM-6831
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Pesach Weinstock
>Assignee: Pablo Estrada
>Priority: Major
>  Labels: bigquery, dataflow, gcp, python
> Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where 
> {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
> than needed:
> [https://www.googleapis.com/bigquery/v2/projects//datasets//tables/?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded 
> from bigquery streaming inserts. When used in a streaming pipeline, we hit 
> this quota pretty quickly, and cannot proceed to write any further data to 
> bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline 
> to write to bigquery, and required us to formally request an API quota 
> increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached 
> screenshot of stackdriver trace.
> {code:java}
>   "errors": [
> {
>   "message": "Exceeded rate limits: too many api requests per user per 
> method for this user_method. For more information, see 
> https://cloud.google.com/bigquery/troubleshooting-errors;,
>   "domain": "usageLimits",
>   "reason": "rateLimitExceeded"
> }
>   ],
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)