so how does the getFailedInserts method work (though from what i saw it does not work)
chaim On Sat, Sep 9, 2017 at 9:49 PM, Reuven Lax <[email protected]> wrote: > I'm still not sure how this would work (or even make sense) for the > streaming-write path. > > Also in both paths, the actual write to BigQuery is unwindowed. > > On Sat, Sep 9, 2017 at 11:44 AM, Eugene Kirpichov <[email protected]> > wrote: > >> There'd be 1 Void per pane per window, so I could extract information >> about whether this is the first pane, last pane, or something else - there >> are probably use cases for each of these. >> >> On Sat, Sep 9, 2017 at 11:37 AM Reuven Lax <[email protected]> wrote: >> >>> How would you know how many Voids to wait for downstream? >>> >>> On Sat, Sep 9, 2017 at 10:46 AM, Eugene Kirpichov <[email protected]> >>> wrote: >>> >>>> Hi Steve, >>>> Unfortunately for BigQuery it's more complicated than that. Rows aren't >>>> written to BigQuery one by one (unless you're using streaming inserts, >>>> which are way more expensive and are usually used only in streaming >>>> pipelines) - they are written to files, and then a BigQuery import job, or >>>> several import jobs if there are too many files, picks them up. We can >>>> declare writing complete when all of the BigQuery import jobs have >>>> successfully completed. >>>> However, the method of writing is an implementation detail of BigQuery, >>>> so we need to create an API that works regardless of the method (import >>>> jobs vs. streaming inserts). >>>> Another complication is triggering - windows can fire multiple times. >>>> This rules out any approaches that sequence using side inputs, because side >>>> inputs don't have triggering. >>>> >>>> I think a common approach could be to return a PCollection<Void>, >>>> containing a Void in every window and pane that has been successfully >>>> written. This could be implemented in both modes and could be a general >>>> design patterns for this sort of thing. It just isn't easy to implement, so >>>> I didn't have time to take it on. It also could turn out to have other >>>> complications we haven't thought of yet. >>>> >>>> That said, if somebody tried to implement this for some connectors (not >>>> necessarily BigQuery) and pioneered the approach, it would be a great >>>> contribution. >>>> >>>> On Sat, Sep 9, 2017 at 9:41 AM Steve Niemitz <[email protected]> >>>> wrote: >>>> >>>>> I wonder if it makes sense to start simple and go from there. For >>>>> example, >>>>> I enhanced BigtableIO.Write to output the number of rows written >>>>> in finishBundle(), simply into the global window with the current >>>>> timestamp. This was more than enough to unblock me, but doesn't support >>>>> more complicated scenarios with windowing. >>>>> >>>>> However, as I said it was more than enough to solve the general batch >>>>> use >>>>> case, and I imagine could be enhanced to support windowing by keeping >>>>> track >>>>> of which windows were written per bundle. (can there even ever be more >>>>> than >>>>> one window per bundle?) >>>>> >>>>> On Fri, Sep 8, 2017 at 2:32 PM, Eugene Kirpichov < >>>>> [email protected]> wrote: >>>>> >>>>> > Hi, >>>>> > I was going to implement this, but discussed it with +Reuven Lax >>>>> > <[email protected]> and it appears to be quite difficult to do >>>>> properly, or >>>>> > even to define what it means at all, especially if you're using the >>>>> > streaming inserts write method. So for now there is no workaround >>>>> except >>>>> > programmatically waiting for your whole pipeline to finish >>>>> > (pipeline.run().waitUntilFinish()). >>>>> > >>>>> > On Fri, Sep 8, 2017 at 2:19 AM Chaim Turkel <[email protected]> wrote: >>>>> > >>>>> > > is there a way around this for now? >>>>> > > how can i get a snapshot version? >>>>> > > >>>>> > > chaim >>>>> > > >>>>> > > On Tue, Sep 5, 2017 at 8:48 AM, Eugene Kirpichov >>>>> > > <[email protected]> wrote: >>>>> > > > Oh I see! Okay, this should be easy to fix. I'll take a look. >>>>> > > > >>>>> > > > On Mon, Sep 4, 2017 at 10:23 PM Chaim Turkel <[email protected]> >>>>> wrote: >>>>> > > > >>>>> > > >> WriteResult does not support apply -> that is the problem >>>>> > > >> >>>>> > > >> On Tue, Sep 5, 2017 at 4:59 AM, Eugene Kirpichov >>>>> > > >> <[email protected]> wrote: >>>>> > > >> > Hi, >>>>> > > >> > >>>>> > > >> > Sorry for the delay. So sounds like you want to do something >>>>> after >>>>> > > >> writing >>>>> > > >> > a window of data to BigQuery is complete. >>>>> > > >> > I think this should be possible: expansion of >>>>> BigQueryIO.write() >>>>> > > returns >>>>> > > >> a >>>>> > > >> > WriteResult and you can apply other transforms to it. Have you >>>>> tried >>>>> > > >> that? >>>>> > > >> > >>>>> > > >> > On Sat, Aug 26, 2017 at 1:10 PM Chaim Turkel <[email protected] >>>>> > >>>>> > > wrote: >>>>> > > >> > >>>>> > > >> >> I have documents from a mongo db that i need to migrate to >>>>> > bigquery. >>>>> > > >> >> Since it is mongodb i do not know they schema ahead of time, >>>>> so i >>>>> > > have >>>>> > > >> >> two pipelines, one to run over the documents and update the >>>>> > bigquery >>>>> > > >> >> schema, then wait a few minutes (i can take for bigquery to >>>>> be able >>>>> > > to >>>>> > > >> >> use the new schema) then with the other pipline copy all the >>>>> > > >> >> documents. >>>>> > > >> >> To know as to where i got with the different piplines i have a >>>>> > status >>>>> > > >> >> table so that at the start i know from where to continue. >>>>> > > >> >> So i need the option to update the status table with the >>>>> success of >>>>> > > >> >> the copy and some time value of the last copied document >>>>> > > >> >> >>>>> > > >> >> >>>>> > > >> >> chaim >>>>> > > >> >> >>>>> > > >> >> On Fri, Aug 25, 2017 at 6:53 PM, Eugene Kirpichov >>>>> > > >> >> <[email protected]> wrote: >>>>> > > >> >> > I'd like to know more about your both use cases, can you >>>>> > clarify? I >>>>> > > >> think >>>>> > > >> >> > making sinks output something that can be waited on by >>>>> another >>>>> > > >> pipeline >>>>> > > >> >> > step is a reasonable request, but more details would help >>>>> refine >>>>> > > this >>>>> > > >> >> > suggestion. >>>>> > > >> >> > >>>>> > > >> >> > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath < >>>>> > > >> [email protected]> >>>>> > > >> >> > wrote: >>>>> > > >> >> > >>>>> > > >> >> >> Can you do this from the program that runs the Beam job, >>>>> after >>>>> > > job is >>>>> > > >> >> >> complete (you might have to use a blocking runner or poll >>>>> for >>>>> > the >>>>> > > >> >> status of >>>>> > > >> >> >> the job) ? >>>>> > > >> >> >> >>>>> > > >> >> >> - Cham >>>>> > > >> >> >> >>>>> > > >> >> >> On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz < >>>>> > > [email protected]> >>>>> > > >> >> wrote: >>>>> > > >> >> >> >>>>> > > >> >> >> > I also have a similar use case (but with BigTable) that >>>>> I feel >>>>> > > >> like I >>>>> > > >> >> had >>>>> > > >> >> >> > to hack up to make work. It'd be great to hear if there >>>>> is a >>>>> > > way >>>>> > > >> to >>>>> > > >> >> do >>>>> > > >> >> >> > something like this already, or if there are plans in the >>>>> > > future. >>>>> > > >> >> >> > >>>>> > > >> >> >> > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel < >>>>> > [email protected] >>>>> > > > >>>>> > > >> >> wrote: >>>>> > > >> >> >> > >>>>> > > >> >> >> > > Hi, >>>>> > > >> >> >> > > I have a few piplines that are an ETL from different >>>>> > > systems to >>>>> > > >> >> >> > bigquery. >>>>> > > >> >> >> > > I would like to write the status of the ETL after all >>>>> > records >>>>> > > >> have >>>>> > > >> >> >> > > been updated to the bigquery. >>>>> > > >> >> >> > > The problem is that writing to bigquery is a sink and >>>>> you >>>>> > > cannot >>>>> > > >> >> have >>>>> > > >> >> >> > > any other steps after the sink. >>>>> > > >> >> >> > > I tried a sideoutput, but this is called in no >>>>> correlation >>>>> > to >>>>> > > the >>>>> > > >> >> >> > > writing to bigquery, so i don't know if it succeeded or >>>>> > > failed. >>>>> > > >> >> >> > > >>>>> > > >> >> >> > > >>>>> > > >> >> >> > > any ideas? >>>>> > > >> >> >> > > chaim >>>>> > > >> >> >> > > >>>>> > > >> >> >> > >>>>> > > >> >> >> >>>>> > > >> >> >>>>> > > >> >>>>> > > >>>>> > >>>>> >>>> >>>
