[
https://issues.apache.org/jira/browse/BEAM-8393?focusedWorklogId=327819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327819
]
ASF GitHub Bot logged work on BEAM-8393:
----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Oct/19 14:02
Start Date: 14/Oct/19 14:02
Worklog Time Spent: 10m
Work Description: jklukas commented on issue #9784: [BEAM-8393] Fix Java
BigQueryIO clustering support for multiple partitions
URL: https://github.com/apache/beam/pull/9784#issuecomment-541696374
I have not added any additional test case here as I feel the bug and fix are
both simple and obvious. Introducing a test case would involve potentially
significant new test code to demonstrate the change; there is an existing case
in `BigQueryIOWriteTest` that checks that a large collection of files is
correctly separated into multiple partitions, but it doesn't actually call
BigQueryIO to load a large number of files.
Demonstrating that this change is working correctly would require actually
running a load job into BigQuery over a large enough number of files to trigger
multiple partitions. The effort and compute resource to run the test seem
unnecessary for the low complexity of the change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 327819)
Time Spent: 20m (was: 10m)
> Java BigQueryIO clustering support breaks on multiple partitions
> ----------------------------------------------------------------
>
> Key: BEAM-8393
> URL: https://issues.apache.org/jira/browse/BEAM-8393
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.15.0, 2.16.0
> Reporter: Jeff Klukas
> Assignee: Jeff Klukas
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Support for writing to clustered tables in BigQuery was added in 2.15, which
> involved adding a new optional clustering field to TableDestination.
> Clustering support is working for most cases, but fails with errors about
> incompatible partitioning specifications for any data that is handled by the
> MultiplePartitions branch of BigQueryIO logic.
> There is a case in that code path where we provide a modified
> TableDestination and neglect to copy the clustering definition, so the final
> load job does not include any clustering columns.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)