Our experience has actually been that avro is more efficient than even parquet, but that might also be skewed from our datasets.
I might try to take a crack at this, I found https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which coincidentally references my thread from a couple years ago on the read side of this :) ). On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote: > It's been talked about, but nobody's done anything. There as some > difficulties related to type conversion (json and avro don't support the > same types), but if those are overcome then an avro version would be much > more efficient. I believe Parquet files would be even more efficient if you > wanted to go that path, but there might be more code to write (as we > already have some code in the codebase to convert between TableRows and > Avro). > > Reuven > > On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org> > wrote: > >> Has anyone investigated using avro rather than json to load data into >> BigQuery using BigQueryIO (+ FILE_LOADS)? >> >> I'd be interested in enhancing it to support this, but I'm curious if >> there's any prior work here. >> >