Thanks Steve!
I'll take a look next week. Sorry about the delay so far.
Best
-P.

On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz <[email protected]> wrote:

> I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for
> this.  The initial results look good.  I'll spend some time soon adding
> unit tests and documentation, but I'd appreciate it if someone could take a
> first pass over it.
>
> On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada <[email protected]> wrote:
>
>> Thanks for offering to work on this! It would be awesome to have it. I
>> can say that we don't have that for Python ATM.
>>
>> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <[email protected]>
>> wrote:
>>
>>> Our experience has actually been that avro is more efficient than even
>>> parquet, but that might also be skewed from our datasets.
>>>
>>> I might try to take a crack at this, I found
>>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which
>>> coincidentally references my thread from a couple years ago on the read
>>> side of this :) ).
>>>
>>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <[email protected]> wrote:
>>>
>>>> It's been talked about, but nobody's done anything. There as some
>>>> difficulties related to type conversion (json and avro don't support the
>>>> same types), but if those are overcome then an avro version would be much
>>>> more efficient. I believe Parquet files would be even more efficient if you
>>>> wanted to go that path, but there might be more code to write (as we
>>>> already have some code in the codebase to convert between TableRows and
>>>> Avro).
>>>>
>>>> Reuven
>>>>
>>>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <[email protected]>
>>>> wrote:
>>>>
>>>>> Has anyone investigated using avro rather than json to load data into
>>>>> BigQuery using BigQueryIO (+ FILE_LOADS)?
>>>>>
>>>>> I'd be interested in enhancing it to support this, but I'm curious if
>>>>> there's any prior work here.
>>>>>
>>>>

Reply via email to