On Sun, Sep 10, 2017 at 10:12 PM, Chaim Turkel <[email protected]> wrote:

> I am migrating multiple tables from mongo to bigquery. So i loop over
> each table and create a PCollection for each table.
> I would like to update a status row for each run (time, records...).
> So I want to write the results at the end.
> I would like to write a ParDo method after the sink but currently
> bigqueryIO does not support this.
>

Makes sense. I wonder if BigQueryIO.Write should return a PCollection of
metadata objects (table name, rows written, etc.) as part of the
WriteResult.


>
> Using metrics is an option but then the client needs to block and wait
> for the pipline to finish - and it might loose the information
>

If using the Dataflow runner, you can always use the job id to get the
metrics even if the main program crashes.

Chaim
>
> On Sun, Sep 10, 2017 at 8:05 PM, Reuven Lax <[email protected]>
> wrote:
> > Can you explain what you mean by multiple collections running on the
> > pipeline? What do you need the results for?
> >
> > On Sat, Sep 9, 2017 at 10:45 PM, Chaim Turkel <[email protected]> wrote:
> >
> >> Hi,
> >>   I am having trouble figuring out what to do with the results. I have
> >> multiple collections running on the pipeline, and since the sink does
> >> not give me the option to get the result, i need to wait for the
> >> pipeline to finish and then poll the results.
> >> From what i can see my only option is to use the metrics, is there
> >> another way to pass information from the collections to the results?
> >>
> >> chaim
> >>
>

Reply via email to