it would be much smarter if the BigQueryIO would return a collection of void
chaim On Mon, Sep 11, 2017 at 8:47 AM, Reuven Lax <re...@google.com.invalid> wrote: > On Sun, Sep 10, 2017 at 10:12 PM, Chaim Turkel <ch...@behalf.com> wrote: > >> I am migrating multiple tables from mongo to bigquery. So i loop over >> each table and create a PCollection for each table. >> I would like to update a status row for each run (time, records...). >> So I want to write the results at the end. >> I would like to write a ParDo method after the sink but currently >> bigqueryIO does not support this. >> > > Makes sense. I wonder if BigQueryIO.Write should return a PCollection of > metadata objects (table name, rows written, etc.) as part of the > WriteResult. > > >> >> Using metrics is an option but then the client needs to block and wait >> for the pipline to finish - and it might loose the information >> > > If using the Dataflow runner, you can always use the job id to get the > metrics even if the main program crashes. > > Chaim >> >> On Sun, Sep 10, 2017 at 8:05 PM, Reuven Lax <re...@google.com.invalid> >> wrote: >> > Can you explain what you mean by multiple collections running on the >> > pipeline? What do you need the results for? >> > >> > On Sat, Sep 9, 2017 at 10:45 PM, Chaim Turkel <ch...@behalf.com> wrote: >> > >> >> Hi, >> >> I am having trouble figuring out what to do with the results. I have >> >> multiple collections running on the pipeline, and since the sink does >> >> not give me the option to get the result, i need to wait for the >> >> pipeline to finish and then poll the results. >> >> From what i can see my only option is to use the metrics, is there >> >> another way to pass information from the collections to the results? >> >> >> >> chaim >> >> >>