+Chamikara Jayalath <[email protected]> with the new BigQuery sink,
schema autodetection is supported (it's a very simple thing to have). Do
you think we should not have it?
Best
-P.

On Mon, Mar 25, 2019 at 11:01 AM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Mon, Mar 25, 2019 at 2:03 AM Juta Staes <[email protected]> wrote:
>
>>
>> On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> We received feedback on https://issuetracker.google.com/issues/129006689 -
>>> BQ developers say that schema identification is done and they discourage to
>>> use schema autodetection in tables using BYTES. In light of this, I think
>>> may be fair to recommend Beam users to specify BQ schemas as well when they
>>> interact with BQ, and call out that writing binary data to BQ will likely
>>> fail unless schema is specified. Does that make sense?
>>>
>>
>> Given that schema autodetect does not work for bytes I think it is indeed
>> a good solution to require users to specify BQ schemas as well when they
>> write to BQ
>>
>> So new summary:
>> 1. Beam will base64-encode raw bytes, before passing them to BQ over rest
>> API. This will be a change in behavior for Python 2 (for good reasons).
>> 2. When reading data from BQ, all fields of type BYTES will be
>> base64-decoded.
>> 3. Beam will send an API call to BigQuery to get table schema, whenever
>> schema is not supplied, to work around
>> https://issuetracker.google.com/issues/129006689. Beam will require
>> users to specify the schema when writing bytes to BQ.
>>
>
> I'm not sure why we reached this conclusion. We (Beam) does not use BQ
> schema auto detection feature currently.  So why not just send an API
> signal to get the schema when users are writing to existing tables ? Also,
> even if we decide to support schema auto detection in the future we will
> not be able to support this for BYTEs type (due to the restriction by BQ).
>
>
>> Thanks all for your input on this!
>> Juta
>>
>>

Reply via email to