[ https://issues.apache.org/jira/browse/BEAM-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388653#comment-15388653 ]
ASF GitHub Bot commented on BEAM-383: ------------------------------------- GitHub user ianzhou1 opened a pull request: https://github.com/apache/incubator-beam/pull/707 [BEAM-383] BigQueryIO update sink to shard into multiple write jobs Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/ianzhou1/incubator-beam BigQueryBranch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #707 ---- commit 5294e161d38c37a71c54a8925288f488e9982cab Author: Ian Zhou <ianz...@google.com> Date: 2016-07-20T22:56:21Z Modified BigQueryIO to write based on number of files and file sizes commit b96184d221de08fd825c9f914b8dc393987c6de9 Author: Ian Zhou <ianz...@google.com> Date: 2016-07-22T00:04:25Z Added unit tests for DoFns used in BigQueryWrite ---- > BigQueryIO: update sink to shard into multiple write jobs > --------------------------------------------------------- > > Key: BEAM-383 > URL: https://issues.apache.org/jira/browse/BEAM-383 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp > Reporter: Daniel Halperin > Assignee: Ian Zhou > > BigQuery has global limits on both the # files that can be written in a > single job and the total bytes in those files. We should be able to modify > BigQueryIO.Write to chunk into multiple smaller jobs that meet these limits, > write to temp tables, and atomically copy into the destination table. > This functionality will let us safely stay within BQ's load job limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)