yu-iskw opened a new pull request, #31310:
URL: https://github.com/apache/beam/pull/31310

   Addresses https://github.com/apache/beam/issues/31226
   
   ## Issue Summary
   
   The `BigQueryServicesImpl` of the Apache Beam SDK does not handle the errors 
of "Not Found" and "Permission Denied" when inserting data into BigQuery fails. 
This results in a Dataflow job attempting to insert the data into BigQuery 
infinitely.
   
   ## Detailed Description
   
   ### Problem Statement
   
   - **Error Handling**: The `BigQueryServicesImpl` does not manage "Not Found" 
and "Permission Denied" errors.
   - **Infinite Retries**: If data insertion into BigQuery fails, the Dataflow 
job retries indefinitely.
   
   ### Current Workarounds
   
   - **Fixed Destination Datasets/Tables**: Errors can be resolved by creating 
the dataset or table, or by granting the required permissions to the service 
account of the Dataflow job.
   - **Dynamic Destination Tables**: When destination tables are determined 
dynamically by the input data:
     - A destination table might not exist due to incorrect input data.
     - A destination table might exist, but the Dataflow job should not insert 
data into it due to incorrect input data.
     - In these cases, creating incorrect destination tables or granting 
permissions to insert into them is not advisable.
   
   ### Potential Solutions
   
   - **Custom `BigQueryServices`**: Modify the behavior of 
`BigQueryServicesImpl` by creating a custom `BigQueryServices` within the 
Apache Beam SDK namespace using the `withTestServices` method. However, this 
method is not recommended for production use due to its complexity.
   - **Dead-letter Topic**: Routing failed records to a dead-letter topic in 
Pub/Sub is not recommended.
     - [Dead-letter topics in 
Pub/Sub](https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub#dead-letter-topics)
   - **Retry Policy**: Handling "Not Found" and "Permission Denied" errors in 
the pipeline with a retry policy would be ideal. Currently, 
`BigQueryServicesImpl` can handle errors returned by the BigQuery API.
     - [BigQuery API 
Reference](https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll)
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [x] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [x] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   
   ## 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to