Okay, I see - there's about 3 different meanings of the word "partition" that could have been involved here (BigQuery partitions, runner-specific bundles, and the Partition transform), hence my request for clarification.
If you mean the Partition transform - then I'm confused what do you mean by BigQueryIO "supporting" it? The Partition transform takes a PCollection and produces a bunch of PCollections; these are ordinary PCollection's and you can apply any Beam transforms to them, and BigQueryIO.write() is no exception to this - you can apply it too. To answer whether using Partition would improve your performance, I'd need to understand exactly what you're comparing against what. I suppose you're comparing the following: 1) Applying BigQueryIO.write() to a PCollection, writing to a single table 2) Splitting a PCollection into several smaller PCollection's using Partition, and applying BigQueryIO.write() to each of them, writing to different tables I suppose? (or do you want to write to different BigQuery partitions of the same table using a table partition decorator?) I would expect #2 to perform strictly worse than #1, because it writes the same amount of data but increases the number of BigQuery load jobs involved (thus increases per-job overhead and consumes BigQuery quota). On Tue, Sep 26, 2017 at 11:35 PM Chaim Turkel <[email protected]> wrote: > https://beam.apache.org/documentation/programming-guide/#partition > > On Tue, Sep 26, 2017 at 6:42 PM, Eugene Kirpichov > <[email protected]> wrote: > > What do you mean by Beam partitions? > > > > On Tue, Sep 26, 2017, 6:57 AM Chaim Turkel <[email protected]> wrote: > > > >> by the way currently the performance on bigquery partitions is very bad. > >> Is there a repository where i can test with 2.2.0? > >> > >> chaim > >> > >> On Tue, Sep 26, 2017 at 4:52 PM, Reuven Lax <[email protected]> > >> wrote: > >> > Do you mean BigQuery partitions? Yes, however 2.1.0 has a bug if the > >> table > >> > containing the partitions is not pre created (fixed in 2.2.0). > >> > > >> > On Tue, Sep 26, 2017 at 6:40 AM, Chaim Turkel <[email protected]> > wrote: > >> > > >> >> Hi, > >> >> > >> >> Does BigQueryIO support Partitions when writing? will it improve > my > >> >> performance? > >> >> > >> >> > >> >> chaim > >> >> > >> >
