+Chamikara Jayalath <[email protected]> with the new BigQuery sink, schema autodetection is supported (it's a very simple thing to have). Do you think we should not have it? Best -P.
On Mon, Mar 25, 2019 at 11:01 AM Chamikara Jayalath <[email protected]> wrote: > > > On Mon, Mar 25, 2019 at 2:03 AM Juta Staes <[email protected]> wrote: > >> >> On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <[email protected]> >> wrote: >> >>> We received feedback on https://issuetracker.google.com/issues/129006689 - >>> BQ developers say that schema identification is done and they discourage to >>> use schema autodetection in tables using BYTES. In light of this, I think >>> may be fair to recommend Beam users to specify BQ schemas as well when they >>> interact with BQ, and call out that writing binary data to BQ will likely >>> fail unless schema is specified. Does that make sense? >>> >> >> Given that schema autodetect does not work for bytes I think it is indeed >> a good solution to require users to specify BQ schemas as well when they >> write to BQ >> >> So new summary: >> 1. Beam will base64-encode raw bytes, before passing them to BQ over rest >> API. This will be a change in behavior for Python 2 (for good reasons). >> 2. When reading data from BQ, all fields of type BYTES will be >> base64-decoded. >> 3. Beam will send an API call to BigQuery to get table schema, whenever >> schema is not supplied, to work around >> https://issuetracker.google.com/issues/129006689. Beam will require >> users to specify the schema when writing bytes to BQ. >> > > I'm not sure why we reached this conclusion. We (Beam) does not use BQ > schema auto detection feature currently. So why not just send an API > signal to get the schema when users are writing to existing tables ? Also, > even if we decide to support schema auto detection in the future we will > not be able to support this for BYTEs type (due to the restriction by BQ). > > >> Thanks all for your input on this! >> Juta >> >>
