Re: Understanding BigQueryIO.Read performance and options

Reuven Lax Sat, 09 Sep 2017 10:38:08 -0700

Hi Steve,

The BigQuery source should always uses extract jobs, regardless of
withTemplateCompatibility. What makes you think otherwise?


Reuven


On Sat, Sep 9, 2017 at 9:35 AM, Steve Niemitz <[email protected]> wrote:

> Hello!
>
> Until now I've been using a custom-built alternative to BigQueryIO.Read
> that manually runs a BigQuery extract job (to avro), then uses
> AvroIO.parseGenericRecords() to read the output.
>
> I'm investigating instead enhancing the actual BigQueryIO.Read to allow
> something similar, since it appears a good amount of the plumbing is
> already in place to do this.  However I'm confused at some of the
> implementation details.
>
> To start, it seems like there's two different read paths:
> - If "withTemplateCompatibility" is set, a similar method to what I
> described above is used; an extract job is started to export to avro, and
> AvroSource is used to read files and transform them into TableRows.
>
> - However, if not set, the BigQueryReader class simply uses the REST API to
> read rows from the tables.  This method, I've seen in practice, has some
> significant performance limitations.
>
> It seems to me that for large tables, I'd always want to use the first
> method, however I'm not sure why the implementation is tied to the oddly
> named "withTemplateCompatibility" option.  Does anyone have insight as to
> the implementation details here?
>
> Additionally, would the community in general be accepting to enhancements
> to BigQueryIO to allow the final output to be something other than
> "TableRow" instances, similar to how AvroIO.parseGenericRecords takes a
> parseFn?
>
> Thanks!
>

Re: Understanding BigQueryIO.Read performance and options

Reply via email to