i am using batch, since streaming cannot be done with partitions with
old data more than 30 days.
the question is how can i catch the exception in the pipline so that
other collections do not fail

On Fri, Sep 15, 2017 at 7:37 PM, Eugene Kirpichov
<[email protected]> wrote:
> Are you using streaming inserts or batch loads method for writing?
> If it's streaming inserts, BigQueryIO already can return the bad records,
> and I believe it won't fail the pipeline, so I'm assuming it's batch loads.
> For batch loads, would it be sufficient for your purposes if
> BigQueryIO.read() let you configure the configuration.load.maxBadRecords
> parameter (see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
> )?
>
> On Thu, Sep 14, 2017 at 10:29 PM Chaim Turkel <[email protected]> wrote:
>
>> I am using the sink of BigQueryIO so the example is not the same. The
>> example is bad data from reading, I have problems when writting. There
>> can be multiple errors when writing to BigQuery, and if it fails there
>> is no way to catch this error, and the whole pipeline fails
>>
>> chaim
>>
>> On Thu, Sep 14, 2017 at 5:48 PM, Reuven Lax <[email protected]>
>> wrote:
>> > What sort of error? You can always put a try/catch inside your DoFns to
>> > catch the majority of errors. A common pattern is to save records that
>> > caused exceptions out to a separate output so you can debug them. This
>> blog
>> > post
>> > <
>> https://cloud.google.com/blog/big-data/2016/01/handling-invalid-inputs-in-dataflow
>> >
>> > explained
>> > the pattern.
>> >
>> > Reuven
>> >
>> > On Thu, Sep 14, 2017 at 1:43 AM, Chaim Turkel <[email protected]> wrote:
>> >
>> >> Hi,
>> >>
>> >>   In one pipeline I have multiple PCollections. If I have an error on
>> >> one then the whole pipline is canceled, is there a way to catch the
>> >> error and log it, and for all other PCollections to continue?
>> >>
>> >>
>> >> chaim
>> >>
>>

Reply via email to