There is no way to do catch an exception inside a transform unless you wrote the transform yourself and have control over the code of its DoFn's. That's why I'm asking whether configuring bad records would be an acceptable workaround.
On Sat, Sep 16, 2017, 11:07 AM Chaim Turkel <[email protected]> wrote: > i am using batch, since streaming cannot be done with partitions with > old data more than 30 days. > the question is how can i catch the exception in the pipline so that > other collections do not fail > > On Fri, Sep 15, 2017 at 7:37 PM, Eugene Kirpichov > <[email protected]> wrote: > > Are you using streaming inserts or batch loads method for writing? > > If it's streaming inserts, BigQueryIO already can return the bad records, > > and I believe it won't fail the pipeline, so I'm assuming it's batch > loads. > > For batch loads, would it be sufficient for your purposes if > > BigQueryIO.read() let you configure the configuration.load.maxBadRecords > > parameter (see > https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs > > )? > > > > On Thu, Sep 14, 2017 at 10:29 PM Chaim Turkel <[email protected]> wrote: > > > >> I am using the sink of BigQueryIO so the example is not the same. The > >> example is bad data from reading, I have problems when writting. There > >> can be multiple errors when writing to BigQuery, and if it fails there > >> is no way to catch this error, and the whole pipeline fails > >> > >> chaim > >> > >> On Thu, Sep 14, 2017 at 5:48 PM, Reuven Lax <[email protected]> > >> wrote: > >> > What sort of error? You can always put a try/catch inside your DoFns > to > >> > catch the majority of errors. A common pattern is to save records that > >> > caused exceptions out to a separate output so you can debug them. This > >> blog > >> > post > >> > < > >> > https://cloud.google.com/blog/big-data/2016/01/handling-invalid-inputs-in-dataflow > >> > > >> > explained > >> > the pattern. > >> > > >> > Reuven > >> > > >> > On Thu, Sep 14, 2017 at 1:43 AM, Chaim Turkel <[email protected]> > wrote: > >> > > >> >> Hi, > >> >> > >> >> In one pipeline I have multiple PCollections. If I have an error on > >> >> one then the whole pipline is canceled, is there a way to catch the > >> >> error and log it, and for all other PCollections to continue? > >> >> > >> >> > >> >> chaim > >> >> > >> >
