Hi Steve, The BigQuery source should always uses extract jobs, regardless of withTemplateCompatibility. What makes you think otherwise?
Reuven On Sat, Sep 9, 2017 at 9:35 AM, Steve Niemitz <[email protected]> wrote: > Hello! > > Until now I've been using a custom-built alternative to BigQueryIO.Read > that manually runs a BigQuery extract job (to avro), then uses > AvroIO.parseGenericRecords() to read the output. > > I'm investigating instead enhancing the actual BigQueryIO.Read to allow > something similar, since it appears a good amount of the plumbing is > already in place to do this. However I'm confused at some of the > implementation details. > > To start, it seems like there's two different read paths: > - If "withTemplateCompatibility" is set, a similar method to what I > described above is used; an extract job is started to export to avro, and > AvroSource is used to read files and transform them into TableRows. > > - However, if not set, the BigQueryReader class simply uses the REST API to > read rows from the tables. This method, I've seen in practice, has some > significant performance limitations. > > It seems to me that for large tables, I'd always want to use the first > method, however I'm not sure why the implementation is tied to the oddly > named "withTemplateCompatibility" option. Does anyone have insight as to > the implementation details here? > > Additionally, would the community in general be accepting to enhancements > to BigQueryIO to allow the final output to be something other than > "TableRow" instances, similar to how AvroIO.parseGenericRecords takes a > parseFn? > > Thanks! >
