On Sun, Sep 10, 2017 at 10:12 PM, Chaim Turkel <[email protected]> wrote:
> I am migrating multiple tables from mongo to bigquery. So i loop over > each table and create a PCollection for each table. > I would like to update a status row for each run (time, records...). > So I want to write the results at the end. > I would like to write a ParDo method after the sink but currently > bigqueryIO does not support this. > Makes sense. I wonder if BigQueryIO.Write should return a PCollection of metadata objects (table name, rows written, etc.) as part of the WriteResult. > > Using metrics is an option but then the client needs to block and wait > for the pipline to finish - and it might loose the information > If using the Dataflow runner, you can always use the job id to get the metrics even if the main program crashes. Chaim > > On Sun, Sep 10, 2017 at 8:05 PM, Reuven Lax <[email protected]> > wrote: > > Can you explain what you mean by multiple collections running on the > > pipeline? What do you need the results for? > > > > On Sat, Sep 9, 2017 at 10:45 PM, Chaim Turkel <[email protected]> wrote: > > > >> Hi, > >> I am having trouble figuring out what to do with the results. I have > >> multiple collections running on the pipeline, and since the sink does > >> not give me the option to get the result, i need to wait for the > >> pipeline to finish and then poll the results. > >> From what i can see my only option is to use the metrics, is there > >> another way to pass information from the collections to the results? > >> > >> chaim > >> >
