Re: writing status

Steve Niemitz Sat, 09 Sep 2017 09:41:12 -0700

I wonder if it makes sense to start simple and go from there.  For example,
I enhanced BigtableIO.Write to output the number of rows written
in finishBundle(), simply into the global window with the current
timestamp.  This was more than enough to unblock me, but doesn't support
more complicated scenarios with windowing.


However, as I said it was more than enough to solve the general batch use
case, and I imagine could be enhanced to support windowing by keeping track
of which windows were written per bundle. (can there even ever be more than
one window per bundle?)

On Fri, Sep 8, 2017 at 2:32 PM, Eugene Kirpichov <
[email protected]> wrote:

> Hi,
> I was going to implement this, but discussed it with +Reuven Lax
> <[email protected]> and it appears to be quite difficult to do properly, or
> even to define what it means at all, especially if you're using the
> streaming inserts write method. So for now there is no workaround except
> programmatically waiting for your whole pipeline to finish
> (pipeline.run().waitUntilFinish()).
>
> On Fri, Sep 8, 2017 at 2:19 AM Chaim Turkel <[email protected]> wrote:
>
> > is there a way around this for now?
> > how can i get a snapshot version?
> >
> > chaim
> >
> > On Tue, Sep 5, 2017 at 8:48 AM, Eugene Kirpichov
> > <[email protected]> wrote:
> > > Oh I see! Okay, this should be easy to fix. I'll take a look.
> > >
> > > On Mon, Sep 4, 2017 at 10:23 PM Chaim Turkel <[email protected]> wrote:
> > >
> > >> WriteResult does not support apply -> that is the problem
> > >>
> > >> On Tue, Sep 5, 2017 at 4:59 AM, Eugene Kirpichov
> > >> <[email protected]> wrote:
> > >> > Hi,
> > >> >
> > >> > Sorry for the delay. So sounds like you want to do something after
> > >> writing
> > >> > a window of data to BigQuery is complete.
> > >> > I think this should be possible: expansion of BigQueryIO.write()
> > returns
> > >> a
> > >> > WriteResult and you can apply other transforms to it. Have you tried
> > >> that?
> > >> >
> > >> > On Sat, Aug 26, 2017 at 1:10 PM Chaim Turkel <[email protected]>
> > wrote:
> > >> >
> > >> >> I have documents from a mongo db that i need to migrate to
> bigquery.
> > >> >> Since it is mongodb i do not know they schema ahead of time, so i
> > have
> > >> >> two pipelines, one to run over the documents and update the
> bigquery
> > >> >> schema, then wait a few minutes (i can take for bigquery to be able
> > to
> > >> >> use the new schema) then with the other pipline copy all the
> > >> >> documents.
> > >> >> To know as to where i got with the different piplines i have a
> status
> > >> >> table so that at the start i know from where to continue.
> > >> >> So i need the option to update the status table with the success of
> > >> >> the copy and some time value of the last copied document
> > >> >>
> > >> >>
> > >> >> chaim
> > >> >>
> > >> >> On Fri, Aug 25, 2017 at 6:53 PM, Eugene Kirpichov
> > >> >> <[email protected]> wrote:
> > >> >> > I'd like to know more about your both use cases, can you
> clarify? I
> > >> think
> > >> >> > making sinks output something that can be waited on by another
> > >> pipeline
> > >> >> > step is a reasonable request, but more details would help refine
> > this
> > >> >> > suggestion.
> > >> >> >
> > >> >> > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath <
> > >> [email protected]>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Can you do this from the program that runs the Beam job, after
> > job is
> > >> >> >> complete (you might have to use a blocking runner or poll for
> the
> > >> >> status of
> > >> >> >> the job) ?
> > >> >> >>
> > >> >> >> - Cham
> > >> >> >>
> > >> >> >> On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz <
> > [email protected]>
> > >> >> wrote:
> > >> >> >>
> > >> >> >> > I also have a similar use case (but with BigTable) that I feel
> > >> like I
> > >> >> had
> > >> >> >> > to hack up to make work.  It'd be great to hear if there is a
> > way
> > >> to
> > >> >> do
> > >> >> >> > something like this already, or if there are plans in the
> > future.
> > >> >> >> >
> > >> >> >> > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel <
> [email protected]
> > >
> > >> >> wrote:
> > >> >> >> >
> > >> >> >> > > Hi,
> > >> >> >> > >   I have a few piplines that are an ETL from different
> > systems to
> > >> >> >> > bigquery.
> > >> >> >> > > I would like to write the status of the ETL after all
> records
> > >> have
> > >> >> >> > > been updated to the bigquery.
> > >> >> >> > > The problem is that writing to bigquery is a sink and you
> > cannot
> > >> >> have
> > >> >> >> > > any other steps after the sink.
> > >> >> >> > > I tried a sideoutput, but this is called in no correlation
> to
> > the
> > >> >> >> > > writing to bigquery, so i don't know if it succeeded or
> > failed.
> > >> >> >> > >
> > >> >> >> > >
> > >> >> >> > > any ideas?
> > >> >> >> > > chaim
> > >> >> >> > >
> > >> >> >> >
> > >> >> >>
> > >> >>
> > >>
> >
>

Re: writing status

Reply via email to