Re: using avro instead of json for BigQueryIO.Write

Steve Niemitz Mon, 16 Sep 2019 10:56:39 -0700

Our experience has actually been that avro is more efficient than even
parquet, but that might also be skewed from our datasets.


I might try to take a crack at this, I found
https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which
coincidentally references my thread from a couple years ago on the read
side of this :) ).

On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote:

> It's been talked about, but nobody's done anything. There as some
> difficulties related to type conversion (json and avro don't support the
> same types), but if those are overcome then an avro version would be much
> more efficient. I believe Parquet files would be even more efficient if you
> wanted to go that path, but there might be more code to write (as we
> already have some code in the codebase to convert between TableRows and
> Avro).
>
> Reuven
>
> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org>
> wrote:
>
>> Has anyone investigated using avro rather than json to load data into
>> BigQuery using BigQueryIO (+ FILE_LOADS)?
>>
>> I'd be interested in enhancing it to support this, but I'm curious if
>> there's any prior work here.
>>
>

Re: using avro instead of json for BigQueryIO.Write

Reply via email to