Okay, I see - there's about 3 different meanings of the word "partition"
that could have been involved here (BigQuery partitions, runner-specific
bundles, and the Partition transform), hence my request for clarification.

If you mean the Partition transform - then I'm confused what do you mean by
BigQueryIO "supporting" it? The Partition transform takes a PCollection and
produces a bunch of PCollections; these are ordinary PCollection's and you
can apply any Beam transforms to them, and BigQueryIO.write() is no
exception to this - you can apply it too.

To answer whether using Partition would improve your performance, I'd need
to understand exactly what you're comparing against what. I suppose you're
comparing the following:
1) Applying BigQueryIO.write() to a PCollection, writing to a single table
2) Splitting a PCollection into several smaller PCollection's using
Partition, and applying BigQueryIO.write() to each of them, writing to
different tables I suppose? (or do you want to write to different BigQuery
partitions of the same table using a table partition decorator?)
I would expect #2 to perform strictly worse than #1, because it writes the
same amount of data but increases the number of BigQuery load jobs involved
(thus increases per-job overhead and consumes BigQuery quota).

On Tue, Sep 26, 2017 at 11:35 PM Chaim Turkel <[email protected]> wrote:

> https://beam.apache.org/documentation/programming-guide/#partition
>
> On Tue, Sep 26, 2017 at 6:42 PM, Eugene Kirpichov
> <[email protected]> wrote:
> > What do you mean by Beam partitions?
> >
> > On Tue, Sep 26, 2017, 6:57 AM Chaim Turkel <[email protected]> wrote:
> >
> >> by the way currently the performance on bigquery partitions is very bad.
> >> Is there a repository where i can test with 2.2.0?
> >>
> >> chaim
> >>
> >> On Tue, Sep 26, 2017 at 4:52 PM, Reuven Lax <[email protected]>
> >> wrote:
> >> > Do you mean BigQuery partitions? Yes, however 2.1.0 has a bug if the
> >> table
> >> > containing the partitions is not pre created (fixed in 2.2.0).
> >> >
> >> > On Tue, Sep 26, 2017 at 6:40 AM, Chaim Turkel <[email protected]>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >>    Does BigQueryIO support Partitions when writing? will it improve
> my
> >> >> performance?
> >> >>
> >> >>
> >> >> chaim
> >> >>
> >>
>

Reply via email to