Re: ExecuteSQL just once
Daniel, One thing you can do (albeit manually) for a "run once" is to schedule ExecuteSQL for some large interval (3600 sec for example), then start and immediately stop the processor. It will run once (stopping the processor doesn't interrupt the current execution, it just prevents it from being scheduled again). The Jira Bryan mentioned is [1], if implemented you could schedule a processor to run N times, and it won't be scheduled after that number of executions has occurred. Back to the custom query, can you provide an example of such a query that you'd like to use? What if we were to add the ability to QueryDatabaseTable and GenerateTableFetch where you can add WHERE clauses, GROUP BYs, ORDER BYs, etc.? We avoided the ability for the user to provide a custom query, because we'd have to parse the arbitrary SQL, insert our own clauses, and offer a way for the user to access state properties (possibly with Expression Language but they'd be "transient attributes", only available to that processor). It gets a little complex, but there is precedence with UpdateAttribute [2]. There are a couple of workarounds. For example you could do the inverse of trying to provide a custom query for the DB processor(s), where you let GenerateTableFetch generate SQL queries, then use ReplaceText to update them inline with your clauses. Alternatively, the Column Names property is used verbatim in the SELECT clause, so you could put a more complex clause in there (maybe a nested select ending with AS so the generated SQL makes it look like a table alias or something). It's definitely a place where we can improve the user experience and/or feature set, I'm interested to hear your thoughts on this and anything else :) Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-1407 [2] https://issues.apache.org/jira/browse/NIFI-1582 On Sun, Oct 30, 2016 at 8:03 PM, Daniel Einspanjer wrote: > Unfortunately, neither of those allow you to specify a custom query, so if > you are doing anything other than a select *, you have a problem. > > The use case I was working with was a "getting started" kind of thing. > Slurp some data in from a DB and try to work with it in various flows. > Unfortunately, my laptop got rather unhappy when I accidentally left the > schedule for ExecuteSQL set to 0 seconds and ran a large query. The step > spewed out several repetitions of that query which bogged things down and > prevented me from being able to stop other processors and clear queues > because the downstream processors had already started processing the first > of the large flow files. > > I'd say probably something that would make a convenient "run once unless > this flag is cleared" type setting would have been what I was looking for. > I admit it isn't quite as likely to be needed in a normal production > environment though, just development. > > -Daniel > > On Sun, Oct 30, 2016 at 7:55 PM Matt Burgess wrote: >> >> We have QueryDatabaseTable and GenerateTableFetch for such things, >> especially the former for 1.0.0. QueryDatabaseTable allows you to pick >> "thedate" as a max-value column that it keeps track of, and you can specify >> the initial maximum value (not sure what version that was added in as I'm >> AFK). >> >> Sent from my iPhone >> >> On Oct 30, 2016, at 7:11 PM, Joe Witt wrote: >> >> Daniel >> >> We don't have mechanisms provided for one time execution at this point. >> Everything is oriented toward continuous execution. That said this is >> starting to come up more and more so seems like we should offer run once >> scheduling or something like that. >> >> Would that give you what you are looking for? Can you talk about your use >> case a bit more ? >> >> Thanks >> Joe >> >> On Oct 30, 2016 7:07 PM, "Daniel Einspanjer" >> wrote: >> > >> > If I put a SQL statement into an ExecuteSQL processor along the lines >> > of: >> > >> > SELECT * FROM foo WHERE thedate >= '2016-01-01' >> > >> > and I build a flow that does something with that data, I'f found if I >> > leave the ExecuteSQL processor running, it will continually run the query, >> > resulting in lots of duplicate data. >> > >> > What is the proper way to handle something that would be sort of a one >> > time event? >> > >> > Right now, I'm just setting a long run schedule time and making sure I >> > turn the step off after it finishes the first run. >> > >> > -Daniel
Re: ExecuteSQL just once
There is also a JIRA/PR out there to let processors specify default values for things like Run Schedule. I would imagine if we had that, and we went through all of the source processors and provided reasonable defaults, that might have helped here. It still doesn't give you the "run once" capability, but at least you wouldn't have had a ton of repeated executions of the expensive query. On Sunday, October 30, 2016, Daniel Einspanjer < daniel.einspan...@designet.com> wrote: > Unfortunately, neither of those allow you to specify a custom query, so if > you are doing anything other than a select *, you have a problem. > > The use case I was working with was a "getting started" kind of thing. > Slurp some data in from a DB and try to work with it in various flows. > Unfortunately, my laptop got rather unhappy when I accidentally left the > schedule for ExecuteSQL set to 0 seconds and ran a large query. The step > spewed out several repetitions of that query which bogged things down and > prevented me from being able to stop other processors and clear queues > because the downstream processors had already started processing the first > of the large flow files. > > I'd say probably something that would make a convenient "run once unless > this flag is cleared" type setting would have been what I was looking for. > I admit it isn't quite as likely to be needed in a normal production > environment though, just development. > > -Daniel > > On Sun, Oct 30, 2016 at 7:55 PM Matt Burgess > wrote: > >> We have QueryDatabaseTable and GenerateTableFetch for such things, >> especially the former for 1.0.0. QueryDatabaseTable allows you to pick >> "thedate" as a max-value column that it keeps track of, and you can specify >> the initial maximum value (not sure what version that was added in as I'm >> AFK). >> >> Sent from my iPhone >> >> On Oct 30, 2016, at 7:11 PM, Joe Witt > > wrote: >> >> Daniel >> >> We don't have mechanisms provided for one time execution at this point. >> Everything is oriented toward continuous execution. That said this is >> starting to come up more and more so seems like we should offer run once >> scheduling or something like that. >> >> Would that give you what you are looking for? Can you talk about your >> use case a bit more ? >> >> Thanks >> Joe >> >> On Oct 30, 2016 7:07 PM, "Daniel Einspanjer" > com > >> wrote: >> > >> > If I put a SQL statement into an ExecuteSQL processor along the lines >> of: >> > >> > SELECT * FROM foo WHERE thedate >= '2016-01-01' >> > >> > and I build a flow that does something with that data, I'f found if I >> leave the ExecuteSQL processor running, it will continually run the query, >> resulting in lots of duplicate data. >> > >> > What is the proper way to handle something that would be sort of a one >> time event? >> > >> > Right now, I'm just setting a long run schedule time and making sure I >> turn the step off after it finishes the first run. >> > >> > -Daniel >> >> -- Sent from Gmail Mobile
Re: ExecuteSQL just once
Unfortunately, neither of those allow you to specify a custom query, so if you are doing anything other than a select *, you have a problem. The use case I was working with was a "getting started" kind of thing. Slurp some data in from a DB and try to work with it in various flows. Unfortunately, my laptop got rather unhappy when I accidentally left the schedule for ExecuteSQL set to 0 seconds and ran a large query. The step spewed out several repetitions of that query which bogged things down and prevented me from being able to stop other processors and clear queues because the downstream processors had already started processing the first of the large flow files. I'd say probably something that would make a convenient "run once unless this flag is cleared" type setting would have been what I was looking for. I admit it isn't quite as likely to be needed in a normal production environment though, just development. -Daniel On Sun, Oct 30, 2016 at 7:55 PM Matt Burgess wrote: > We have QueryDatabaseTable and GenerateTableFetch for such things, > especially the former for 1.0.0. QueryDatabaseTable allows you to pick > "thedate" as a max-value column that it keeps track of, and you can specify > the initial maximum value (not sure what version that was added in as I'm > AFK). > > Sent from my iPhone > > On Oct 30, 2016, at 7:11 PM, Joe Witt wrote: > > Daniel > > We don't have mechanisms provided for one time execution at this point. > Everything is oriented toward continuous execution. That said this is > starting to come up more and more so seems like we should offer run once > scheduling or something like that. > > Would that give you what you are looking for? Can you talk about your use > case a bit more ? > > Thanks > Joe > > On Oct 30, 2016 7:07 PM, "Daniel Einspanjer" < > daniel.einspan...@designet.com> wrote: > > > > If I put a SQL statement into an ExecuteSQL processor along the lines of: > > > > SELECT * FROM foo WHERE thedate >= '2016-01-01' > > > > and I build a flow that does something with that data, I'f found if I > leave the ExecuteSQL processor running, it will continually run the query, > resulting in lots of duplicate data. > > > > What is the proper way to handle something that would be sort of a one > time event? > > > > Right now, I'm just setting a long run schedule time and making sure I > turn the step off after it finishes the first run. > > > > -Daniel > >
Re: ExecuteSQL just once
We have QueryDatabaseTable and GenerateTableFetch for such things, especially the former for 1.0.0. QueryDatabaseTable allows you to pick "thedate" as a max-value column that it keeps track of, and you can specify the initial maximum value (not sure what version that was added in as I'm AFK). Sent from my iPhone > On Oct 30, 2016, at 7:11 PM, Joe Witt wrote: > > Daniel > > We don't have mechanisms provided for one time execution at this point. > Everything is oriented toward continuous execution. That said this is > starting to come up more and more so seems like we should offer run once > scheduling or something like that. > > Would that give you what you are looking for? Can you talk about your use > case a bit more ? > > Thanks > Joe > > On Oct 30, 2016 7:07 PM, "Daniel Einspanjer" > wrote: > > > > If I put a SQL statement into an ExecuteSQL processor along the lines of: > > > > SELECT * FROM foo WHERE thedate >= '2016-01-01' > > > > and I build a flow that does something with that data, I'f found if I leave > > the ExecuteSQL processor running, it will continually run the query, > > resulting in lots of duplicate data. > > > > What is the proper way to handle something that would be sort of a one time > > event? > > > > Right now, I'm just setting a long run schedule time and making sure I turn > > the step off after it finishes the first run. > > > > -Daniel
Re: ExecuteSQL just once
Daniel We don't have mechanisms provided for one time execution at this point. Everything is oriented toward continuous execution. That said this is starting to come up more and more so seems like we should offer run once scheduling or something like that. Would that give you what you are looking for? Can you talk about your use case a bit more ? Thanks Joe On Oct 30, 2016 7:07 PM, "Daniel Einspanjer" wrote: > > If I put a SQL statement into an ExecuteSQL processor along the lines of: > > SELECT * FROM foo WHERE thedate >= '2016-01-01' > > and I build a flow that does something with that data, I'f found if I leave the ExecuteSQL processor running, it will continually run the query, resulting in lots of duplicate data. > > What is the proper way to handle something that would be sort of a one time event? > > Right now, I'm just setting a long run schedule time and making sure I turn the step off after it finishes the first run. > > -Daniel
ExecuteSQL just once
If I put a SQL statement into an ExecuteSQL processor along the lines of: SELECT * FROM foo WHERE thedate >= '2016-01-01' and I build a flow that does something with that data, I'f found if I leave the ExecuteSQL processor running, it will continually run the query, resulting in lots of duplicate data. What is the proper way to handle something that would be sort of a one time event? Right now, I'm just setting a long run schedule time and making sure I turn the step off after it finishes the first run. -Daniel