so what you are saying is that windowing is not supported on the bigquery? this does not make sense since i am using it for the table partition, and that works fine?
On Sat, Sep 9, 2017 at 7:40 PM, Steve Niemitz <[email protected]> wrote: > I wonder if it makes sense to start simple and go from there. For example, > I enhanced BigtableIO.Write to output the number of rows written > in finishBundle(), simply into the global window with the current > timestamp. This was more than enough to unblock me, but doesn't support > more complicated scenarios with windowing. > > However, as I said it was more than enough to solve the general batch use > case, and I imagine could be enhanced to support windowing by keeping track > of which windows were written per bundle. (can there even ever be more than > one window per bundle?) > > On Fri, Sep 8, 2017 at 2:32 PM, Eugene Kirpichov < > [email protected]> wrote: > >> Hi, >> I was going to implement this, but discussed it with +Reuven Lax >> <[email protected]> and it appears to be quite difficult to do properly, or >> even to define what it means at all, especially if you're using the >> streaming inserts write method. So for now there is no workaround except >> programmatically waiting for your whole pipeline to finish >> (pipeline.run().waitUntilFinish()). >> >> On Fri, Sep 8, 2017 at 2:19 AM Chaim Turkel <[email protected]> wrote: >> >> > is there a way around this for now? >> > how can i get a snapshot version? >> > >> > chaim >> > >> > On Tue, Sep 5, 2017 at 8:48 AM, Eugene Kirpichov >> > <[email protected]> wrote: >> > > Oh I see! Okay, this should be easy to fix. I'll take a look. >> > > >> > > On Mon, Sep 4, 2017 at 10:23 PM Chaim Turkel <[email protected]> wrote: >> > > >> > >> WriteResult does not support apply -> that is the problem >> > >> >> > >> On Tue, Sep 5, 2017 at 4:59 AM, Eugene Kirpichov >> > >> <[email protected]> wrote: >> > >> > Hi, >> > >> > >> > >> > Sorry for the delay. So sounds like you want to do something after >> > >> writing >> > >> > a window of data to BigQuery is complete. >> > >> > I think this should be possible: expansion of BigQueryIO.write() >> > returns >> > >> a >> > >> > WriteResult and you can apply other transforms to it. Have you tried >> > >> that? >> > >> > >> > >> > On Sat, Aug 26, 2017 at 1:10 PM Chaim Turkel <[email protected]> >> > wrote: >> > >> > >> > >> >> I have documents from a mongo db that i need to migrate to >> bigquery. >> > >> >> Since it is mongodb i do not know they schema ahead of time, so i >> > have >> > >> >> two pipelines, one to run over the documents and update the >> bigquery >> > >> >> schema, then wait a few minutes (i can take for bigquery to be able >> > to >> > >> >> use the new schema) then with the other pipline copy all the >> > >> >> documents. >> > >> >> To know as to where i got with the different piplines i have a >> status >> > >> >> table so that at the start i know from where to continue. >> > >> >> So i need the option to update the status table with the success of >> > >> >> the copy and some time value of the last copied document >> > >> >> >> > >> >> >> > >> >> chaim >> > >> >> >> > >> >> On Fri, Aug 25, 2017 at 6:53 PM, Eugene Kirpichov >> > >> >> <[email protected]> wrote: >> > >> >> > I'd like to know more about your both use cases, can you >> clarify? I >> > >> think >> > >> >> > making sinks output something that can be waited on by another >> > >> pipeline >> > >> >> > step is a reasonable request, but more details would help refine >> > this >> > >> >> > suggestion. >> > >> >> > >> > >> >> > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath < >> > >> [email protected]> >> > >> >> > wrote: >> > >> >> > >> > >> >> >> Can you do this from the program that runs the Beam job, after >> > job is >> > >> >> >> complete (you might have to use a blocking runner or poll for >> the >> > >> >> status of >> > >> >> >> the job) ? >> > >> >> >> >> > >> >> >> - Cham >> > >> >> >> >> > >> >> >> On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz < >> > [email protected]> >> > >> >> wrote: >> > >> >> >> >> > >> >> >> > I also have a similar use case (but with BigTable) that I feel >> > >> like I >> > >> >> had >> > >> >> >> > to hack up to make work. It'd be great to hear if there is a >> > way >> > >> to >> > >> >> do >> > >> >> >> > something like this already, or if there are plans in the >> > future. >> > >> >> >> > >> > >> >> >> > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel < >> [email protected] >> > > >> > >> >> wrote: >> > >> >> >> > >> > >> >> >> > > Hi, >> > >> >> >> > > I have a few piplines that are an ETL from different >> > systems to >> > >> >> >> > bigquery. >> > >> >> >> > > I would like to write the status of the ETL after all >> records >> > >> have >> > >> >> >> > > been updated to the bigquery. >> > >> >> >> > > The problem is that writing to bigquery is a sink and you >> > cannot >> > >> >> have >> > >> >> >> > > any other steps after the sink. >> > >> >> >> > > I tried a sideoutput, but this is called in no correlation >> to >> > the >> > >> >> >> > > writing to bigquery, so i don't know if it succeeded or >> > failed. >> > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > any ideas? >> > >> >> >> > > chaim >> > >> >> >> > > >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >>
