[ https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=270433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-270433 ]
ASF GitHub Bot logged work on BEAM-5191: ---------------------------------------- Author: ASF GitHub Bot Created on: 01/Jul/19 19:30 Start Date: 01/Jul/19 19:30 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #8945: [BEAM-5191] Support for BigQuery clustering URL: https://github.com/apache/beam/pull/8945#discussion_r299185667 ########## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java ########## @@ -167,7 +175,12 @@ public TableDestination getTable(DestinationT destination) { @Override Coder<DestinationT> getDestinationCoderWithDefault(CoderRegistry registry) throws CannotProvideCoderException { - return inner.getDestinationCoderWithDefault(registry); + Coder<DestinationT> destinationCoder = getDestinationCoder(); Review comment: `DynamicDestinations#getDestinationCoderWithDefault` is commented as: ``` // Gets the destination coder. If the user does not provide one, try to find one in the coder // registry. If no coder can be found, throws CannotProvideCoderException. ``` This code is written with potentially multiple layers of delegation, and I think the correct behavior here is to return the first non-delegated implementation of `getDestinationCoder()` that appears as we move down the delegation chain. I would argue that the existing behavior is incorrect. Currently, if an implementing class defines a custom return value for `getDestinationCoder`, that value is ignored when you call `getDestinationCoderWithDefault`. My expectation is that `getDestinationCoderWithDefault` would always return the same value as `getDestinationCoder` except in the null case, in which `getDestinationCoderWithDefault` would then attempt to look up a coder in the registry. So the change here is intended to fix broken behavior. It's possible that a user has written a custom class that extends DelegatingDynamicDestinations and relies on the incorrect behavior, but it feels unlikely to me. For the scope of the coders provided here, I don't believe this change affects behavior (the method was already returning TableDestinationCoderV2 in all cases). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 270433) Time Spent: 9h (was: 8h 50m) > Add support for writing to BigQuery clustered tables > ---------------------------------------------------- > > Key: BEAM-5191 > URL: https://issues.apache.org/jira/browse/BEAM-5191 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp > Affects Versions: 2.6.0 > Reporter: Robert Sahlin > Assignee: Wout Scheepers > Priority: Minor > Labels: features, newbie > Time Spent: 9h > Remaining Estimate: 0h > > Google recently added support for clustered tables in BigQuery. It would be > useful to set clustering columns the same way as for partitioning. It should > support multiple fields (4) for clustering. > For example: > [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]> > .withClustering(new Clustering().setField("productId").setType("STRING")) -- This message was sent by Atlassian JIRA (v7.6.3#76005)