Are you using streaming inserts or batch loads method for writing?
If it's streaming inserts, BigQueryIO already can return the bad records,
and I believe it won't fail the pipeline, so I'm assuming it's batch loads.
For batch loads, would it be sufficient for your purposes if
BigQueryIO.read() let you configure the configuration.load.maxBadRecords
parameter (see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
)?

On Thu, Sep 14, 2017 at 10:29 PM Chaim Turkel <[email protected]> wrote:

> I am using the sink of BigQueryIO so the example is not the same. The
> example is bad data from reading, I have problems when writting. There
> can be multiple errors when writing to BigQuery, and if it fails there
> is no way to catch this error, and the whole pipeline fails
>
> chaim
>
> On Thu, Sep 14, 2017 at 5:48 PM, Reuven Lax <[email protected]>
> wrote:
> > What sort of error? You can always put a try/catch inside your DoFns to
> > catch the majority of errors. A common pattern is to save records that
> > caused exceptions out to a separate output so you can debug them. This
> blog
> > post
> > <
> https://cloud.google.com/blog/big-data/2016/01/handling-invalid-inputs-in-dataflow
> >
> > explained
> > the pattern.
> >
> > Reuven
> >
> > On Thu, Sep 14, 2017 at 1:43 AM, Chaim Turkel <[email protected]> wrote:
> >
> >> Hi,
> >>
> >>   In one pipeline I have multiple PCollections. If I have an error on
> >> one then the whole pipline is canceled, is there a way to catch the
> >> error and log it, and for all other PCollections to continue?
> >>
> >>
> >> chaim
> >>
>

Reply via email to