On Wed, Jul 8, 2020 at 3:30 AM <[email protected]> wrote:
> Hi All, > > > > I am posting this to the dev (as opposed to user channel) as I believe it > will be of interest to the those working on either Schemas or BigQuery > > > > I have a pipeline based on BEAM 2.22 that is ingesting data into > BigQuery. Internally I am using protobuf for my domain model and the > associated schema support. > > > > My intention is to make use of the useBeamSchema() method to both > auto-generate the BigQuery table schema and to provide row conversion on > write. (The idea is to have true schema-first development very much in > keeping with Alex’s original ProtoBEAM concept). > > > > The issue I’ve hit is around treatment of google.protobuf.Timestamp > fields. The schema conversion seems to map these to the correct logical > type: org.apache.beam.sdk.schemas.logicaltypes.NanosInstant, however this > isn’t recognised by BigQueryIO.Write. Specifically the > BigQueryUtils.toTableSchema() method throws a NullPointerException. This > seems to be due to the fact that there is no entry for NanosInstant in > the BEAM_TO_BIG_QUERY_LOGICAL_MAPPING map. > This does sound like a bug since Beam Schema to BigQuery type conversion [1] indeed does not consider org.apache.beam.sdk.schemas.logicaltypes.NanosInstant. Will you be able to file a JIRA with preferably a test to reproduce this ? https://github.com/apache/beam/blob/6c313eb84af6229f0a8a7a0b5890f18c5a8685e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L202 Thanks, Cham > > Is this a known issue? Is there a workaround? > > > > I appreciate that google.protobuf.Timestamp supports nanosecond-level > precision so cannot be converted directly to the BEAM schema type of > DATETIME without loss of precision. However, I believe use cases for > nanosecond precision are rare. Would it not be better to convert directly > to DATETIME according to the *principle of least confusion*? > > > > Are there any plans to extend the range of types both within protobuf and > the BEAM schema to match the richer type set within BigQuery (DATE, > DATETIME, TIMESTAMP)? I would expect the combination of > protobuf/BEAM/BigQuery to be a common one (especially within GCP) and it > would be nice as a developer to have a greater range of options. > > > > Kind regards, > > > > Rob > > > > *Robert Butcher* > > *Technical Architect | Foundry/SRS | NatWest Markets* > > WeWork, 10 Devonshire Square, London, EC2M 4AE > > Mobile +44 (0) 7414 730866 <+44%207414%20730866> > > > > This email is classified as *CONFIDENTIAL* unless otherwise stated. > > > > This communication and any attachments are confidential and intended > solely for the addressee. If you are not the intended recipient please > advise us immediately and delete it. Unless specifically stated in the > message or otherwise indicated, you may not duplicate, redistribute or > forward this message and any attachments are not intended for distribution > to, or use by any person or entity in any jurisdiction or country where > such distribution or use would be contrary to local law or regulation. > NatWest Markets Plc or any affiliated entity ("NatWest Markets") accepts > no responsibility for any changes made to this message after it was sent. > Unless otherwise specifically indicated, the contents of this > communication and its attachments are for information purposes only and > should not be regarded as an offer or solicitation to buy or sell a product > or service, confirmation of any transaction, a valuation, indicative price > or an official statement. Trading desks may have a position or interest > that is inconsistent with any views expressed in this message. In > evaluating the information contained in this message, you should know that > it could have been previously provided to other clients and/or internal > NatWest Markets personnel, who could have already acted on it. > NatWest Markets cannot provide absolute assurances that all electronic > communications (sent or received) are secure, error free, not corrupted, > incomplete or virus free and/or that they will not be lost, mis-delivered, > destroyed, delayed or intercepted/decrypted by others. Therefore NatWest > Markets disclaims all liability with regards to electronic communications > (and the contents therein) if they are corrupted, lost destroyed, delayed, > incomplete, mis-delivered, intercepted, decrypted or otherwise > misappropriated by others. > Any electronic communication that is conducted within or through NatWest > Markets systems will be subject to being archived, monitored and produced > to regulators and in litigation in accordance with NatWest Markets’ policy > and local laws, rules and regulations. Unless expressly prohibited by local > law, electronic communications may be archived in countries other than the > country in which you are located, and may be treated in accordance with the > laws and regulations of the country of each individual included in the > entire chain. > Copyright NatWest Markets Plc. All rights reserved. See > https://www.nwm.com/disclaimer for further risk disclosure. >
