Thanks Steve! I'll take a look next week. Sorry about the delay so far. Best -P.
On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz <[email protected]> wrote: > I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for > this. The initial results look good. I'll spend some time soon adding > unit tests and documentation, but I'd appreciate it if someone could take a > first pass over it. > > On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada <[email protected]> wrote: > >> Thanks for offering to work on this! It would be awesome to have it. I >> can say that we don't have that for Python ATM. >> >> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <[email protected]> >> wrote: >> >>> Our experience has actually been that avro is more efficient than even >>> parquet, but that might also be skewed from our datasets. >>> >>> I might try to take a crack at this, I found >>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which >>> coincidentally references my thread from a couple years ago on the read >>> side of this :) ). >>> >>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <[email protected]> wrote: >>> >>>> It's been talked about, but nobody's done anything. There as some >>>> difficulties related to type conversion (json and avro don't support the >>>> same types), but if those are overcome then an avro version would be much >>>> more efficient. I believe Parquet files would be even more efficient if you >>>> wanted to go that path, but there might be more code to write (as we >>>> already have some code in the codebase to convert between TableRows and >>>> Avro). >>>> >>>> Reuven >>>> >>>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <[email protected]> >>>> wrote: >>>> >>>>> Has anyone investigated using avro rather than json to load data into >>>>> BigQuery using BigQueryIO (+ FILE_LOADS)? >>>>> >>>>> I'd be interested in enhancing it to support this, but I'm curious if >>>>> there's any prior work here. >>>>> >>>>
