it would be much smarter if the BigQueryIO would return a collection of void


chaim

On Mon, Sep 11, 2017 at 8:47 AM, Reuven Lax <re...@google.com.invalid> wrote:
> On Sun, Sep 10, 2017 at 10:12 PM, Chaim Turkel <ch...@behalf.com> wrote:
>
>> I am migrating multiple tables from mongo to bigquery. So i loop over
>> each table and create a PCollection for each table.
>> I would like to update a status row for each run (time, records...).
>> So I want to write the results at the end.
>> I would like to write a ParDo method after the sink but currently
>> bigqueryIO does not support this.
>>
>
> Makes sense. I wonder if BigQueryIO.Write should return a PCollection of
> metadata objects (table name, rows written, etc.) as part of the
> WriteResult.
>
>
>>
>> Using metrics is an option but then the client needs to block and wait
>> for the pipline to finish - and it might loose the information
>>
>
> If using the Dataflow runner, you can always use the job id to get the
> metrics even if the main program crashes.
>
> Chaim
>>
>> On Sun, Sep 10, 2017 at 8:05 PM, Reuven Lax <re...@google.com.invalid>
>> wrote:
>> > Can you explain what you mean by multiple collections running on the
>> > pipeline? What do you need the results for?
>> >
>> > On Sat, Sep 9, 2017 at 10:45 PM, Chaim Turkel <ch...@behalf.com> wrote:
>> >
>> >> Hi,
>> >>   I am having trouble figuring out what to do with the results. I have
>> >> multiple collections running on the pipeline, and since the sink does
>> >> not give me the option to get the result, i need to wait for the
>> >> pipeline to finish and then poll the results.
>> >> From what i can see my only option is to use the metrics, is there
>> >> another way to pass information from the collections to the results?
>> >>
>> >> chaim
>> >>
>>

Reply via email to