Re: schedule execution from relational database?
Hi Ed, It usually would be days from the moment new value is captured On Mon, Oct 29, 2018 at 6:29 PM Ed B wrote: > Hey Victor, > > If you already pulled the record and know new value - that won't really > help you to determine a change in a schedule. > In my opinion, the schedule determined by the acceptable data latency for > given application, in other words, how soon you want your changed data be > captured. > The answer can be from "real-time" to "on-demand". > > For your particular case, you need to decide and then either schedule > every X sec/min/hours/days, etc, or at given time (at minute 30 of each > hour every day except for Saturday). > If you don't know what your requirements for data availability and latency > are, you could start with something like every "5 mins". And then adjust as > needed. > > Regards, > Ed. > > > On Mon, Oct 29, 2018 at 3:58 PM l vic wrote: > >> QDT works, eg it can detect change in MaximumValue column but how can I >> use it to schedule cron job? I know it's possible to schedule cron from UI >> but how can i do it based on the value of attribute? >> Thank you again, >> V. >> >> On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess >> wrote: >> >>> Victor, >>> >>> Yes, both QDT and GTF would generate something like "SELECT * from >>> myTable where event_time > X", and QDT will execute it and update X. >>> So if event_time is always increasing, it will continue to pick up the >>> same row(s). >>> >>> That's a curious use case, maybe NiFi could handle other parts of it >>> so you wouldn't need to update a single row in an external database >>> table? >>> >>> Regards, >>> Matt >>> >>> On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: >>> > >>> > What if have only one row and update the values in it? Will QDT fetch >>> updates? >>> > Thank you, >>> > Victor >>> > >>> > >>> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess >>> wrote: >>> >> >>> >> You can use QueryDatabaseTable (QDT) for this, you'd set your >>> >> "event_time" column as the "Maximum Value Column(s)" property in the >>> >> processor. The first time QDT executes, it will fetch all the rows >>> >> (since it has not seen event_time before), then it will keep track of >>> >> the largest value of event_time. As new rows are added (with larger >>> >> event_time values), QDT will only fetch the rows whose event_time is >>> >> greater than the largest one it's seen. Then it updates its "largest >>> >> seen value" and so on. >>> >> >>> >> GenerateTableFetch (GTF) is another option, it works in a similar >>> >> fashion, except that it does not fetch the rows itself, instead it >>> >> generates flow files containing SQL statements that you can send >>> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. >>> >> GTF is often used in place of QDT if you'll be fetching a large number >>> >> of rows in each statement, as you can distribute the SQL flow files >>> >> among the nodes in a cluster, to do the fetch in parallel. >>> >> >>> >> Regards, >>> >> Matt >>> >> >>> >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: >>> >> > >>> >> > Hi, >>> >> > i have "event_time" field in SQLite database that means epoch time >>> for triggering of external event. What processor(s) can i use to implement >>> schedule monitoring/ execution based on change in "event_time" value? >>> >> > Thanks, >>> >>
Re: schedule execution from relational database?
My purpose is to use new epoch milliseconds value in flow file to schedule spark job at corresponding date/time, I am asking how that can be done in NiFi. Thank you, Victor On Mon, Oct 29, 2018 at 4:30 PM Matt Burgess wrote: > Not sure I understand what you mean. Are you using the flow file to > trigger ExecuteStreamCommand to schedule a cron job? Or do you mean > scheduling a processor to run in NiFi? Or something else? > On Mon, Oct 29, 2018 at 3:58 PM l vic wrote: > > > > QDT works, eg it can detect change in MaximumValue column but how can I > use it to schedule cron job? I know it's possible to schedule cron from UI > but how can i do it based on the value of attribute? > > Thank you again, > > V. > > > > On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess > wrote: > >> > >> Victor, > >> > >> Yes, both QDT and GTF would generate something like "SELECT * from > >> myTable where event_time > X", and QDT will execute it and update X. > >> So if event_time is always increasing, it will continue to pick up the > >> same row(s). > >> > >> That's a curious use case, maybe NiFi could handle other parts of it > >> so you wouldn't need to update a single row in an external database > >> table? > >> > >> Regards, > >> Matt > >> > >> On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: > >> > > >> > What if have only one row and update the values in it? Will QDT fetch > updates? > >> > Thank you, > >> > Victor > >> > > >> > > >> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess > wrote: > >> >> > >> >> You can use QueryDatabaseTable (QDT) for this, you'd set your > >> >> "event_time" column as the "Maximum Value Column(s)" property in the > >> >> processor. The first time QDT executes, it will fetch all the rows > >> >> (since it has not seen event_time before), then it will keep track of > >> >> the largest value of event_time. As new rows are added (with larger > >> >> event_time values), QDT will only fetch the rows whose event_time is > >> >> greater than the largest one it's seen. Then it updates its "largest > >> >> seen value" and so on. > >> >> > >> >> GenerateTableFetch (GTF) is another option, it works in a similar > >> >> fashion, except that it does not fetch the rows itself, instead it > >> >> generates flow files containing SQL statements that you can send > >> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. > >> >> GTF is often used in place of QDT if you'll be fetching a large > number > >> >> of rows in each statement, as you can distribute the SQL flow files > >> >> among the nodes in a cluster, to do the fetch in parallel. > >> >> > >> >> Regards, > >> >> Matt > >> >> > >> >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: > >> >> > > >> >> > Hi, > >> >> > i have "event_time" field in SQLite database that means epoch time > for triggering of external event. What processor(s) can i use to implement > schedule monitoring/ execution based on change in "event_time" value? > >> >> > Thanks, >
Re: schedule execution from relational database?
Hey Victor, If you already pulled the record and know new value - that won't really help you to determine a change in a schedule. In my opinion, the schedule determined by the acceptable data latency for given application, in other words, how soon you want your changed data be captured. The answer can be from "real-time" to "on-demand". For your particular case, you need to decide and then either schedule every X sec/min/hours/days, etc, or at given time (at minute 30 of each hour every day except for Saturday). If you don't know what your requirements for data availability and latency are, you could start with something like every "5 mins". And then adjust as needed. Regards, Ed. On Mon, Oct 29, 2018 at 3:58 PM l vic wrote: > QDT works, eg it can detect change in MaximumValue column but how can I > use it to schedule cron job? I know it's possible to schedule cron from UI > but how can i do it based on the value of attribute? > Thank you again, > V. > > On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess > wrote: > >> Victor, >> >> Yes, both QDT and GTF would generate something like "SELECT * from >> myTable where event_time > X", and QDT will execute it and update X. >> So if event_time is always increasing, it will continue to pick up the >> same row(s). >> >> That's a curious use case, maybe NiFi could handle other parts of it >> so you wouldn't need to update a single row in an external database >> table? >> >> Regards, >> Matt >> >> On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: >> > >> > What if have only one row and update the values in it? Will QDT fetch >> updates? >> > Thank you, >> > Victor >> > >> > >> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess >> wrote: >> >> >> >> You can use QueryDatabaseTable (QDT) for this, you'd set your >> >> "event_time" column as the "Maximum Value Column(s)" property in the >> >> processor. The first time QDT executes, it will fetch all the rows >> >> (since it has not seen event_time before), then it will keep track of >> >> the largest value of event_time. As new rows are added (with larger >> >> event_time values), QDT will only fetch the rows whose event_time is >> >> greater than the largest one it's seen. Then it updates its "largest >> >> seen value" and so on. >> >> >> >> GenerateTableFetch (GTF) is another option, it works in a similar >> >> fashion, except that it does not fetch the rows itself, instead it >> >> generates flow files containing SQL statements that you can send >> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. >> >> GTF is often used in place of QDT if you'll be fetching a large number >> >> of rows in each statement, as you can distribute the SQL flow files >> >> among the nodes in a cluster, to do the fetch in parallel. >> >> >> >> Regards, >> >> Matt >> >> >> >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: >> >> > >> >> > Hi, >> >> > i have "event_time" field in SQLite database that means epoch time >> for triggering of external event. What processor(s) can i use to implement >> schedule monitoring/ execution based on change in "event_time" value? >> >> > Thanks, >> >
Re: schedule execution from relational database?
Not sure I understand what you mean. Are you using the flow file to trigger ExecuteStreamCommand to schedule a cron job? Or do you mean scheduling a processor to run in NiFi? Or something else? On Mon, Oct 29, 2018 at 3:58 PM l vic wrote: > > QDT works, eg it can detect change in MaximumValue column but how can I use > it to schedule cron job? I know it's possible to schedule cron from UI but > how can i do it based on the value of attribute? > Thank you again, > V. > > On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess wrote: >> >> Victor, >> >> Yes, both QDT and GTF would generate something like "SELECT * from >> myTable where event_time > X", and QDT will execute it and update X. >> So if event_time is always increasing, it will continue to pick up the >> same row(s). >> >> That's a curious use case, maybe NiFi could handle other parts of it >> so you wouldn't need to update a single row in an external database >> table? >> >> Regards, >> Matt >> >> On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: >> > >> > What if have only one row and update the values in it? Will QDT fetch >> > updates? >> > Thank you, >> > Victor >> > >> > >> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess wrote: >> >> >> >> You can use QueryDatabaseTable (QDT) for this, you'd set your >> >> "event_time" column as the "Maximum Value Column(s)" property in the >> >> processor. The first time QDT executes, it will fetch all the rows >> >> (since it has not seen event_time before), then it will keep track of >> >> the largest value of event_time. As new rows are added (with larger >> >> event_time values), QDT will only fetch the rows whose event_time is >> >> greater than the largest one it's seen. Then it updates its "largest >> >> seen value" and so on. >> >> >> >> GenerateTableFetch (GTF) is another option, it works in a similar >> >> fashion, except that it does not fetch the rows itself, instead it >> >> generates flow files containing SQL statements that you can send >> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. >> >> GTF is often used in place of QDT if you'll be fetching a large number >> >> of rows in each statement, as you can distribute the SQL flow files >> >> among the nodes in a cluster, to do the fetch in parallel. >> >> >> >> Regards, >> >> Matt >> >> >> >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: >> >> > >> >> > Hi, >> >> > i have "event_time" field in SQLite database that means epoch time for >> >> > triggering of external event. What processor(s) can i use to implement >> >> > schedule monitoring/ execution based on change in "event_time" value? >> >> > Thanks,
Re: schedule execution from relational database?
QDT works, eg it can detect change in MaximumValue column but how can I use it to schedule cron job? I know it's possible to schedule cron from UI but how can i do it based on the value of attribute? Thank you again, V. On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess wrote: > Victor, > > Yes, both QDT and GTF would generate something like "SELECT * from > myTable where event_time > X", and QDT will execute it and update X. > So if event_time is always increasing, it will continue to pick up the > same row(s). > > That's a curious use case, maybe NiFi could handle other parts of it > so you wouldn't need to update a single row in an external database > table? > > Regards, > Matt > > On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: > > > > What if have only one row and update the values in it? Will QDT fetch > updates? > > Thank you, > > Victor > > > > > > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess > wrote: > >> > >> You can use QueryDatabaseTable (QDT) for this, you'd set your > >> "event_time" column as the "Maximum Value Column(s)" property in the > >> processor. The first time QDT executes, it will fetch all the rows > >> (since it has not seen event_time before), then it will keep track of > >> the largest value of event_time. As new rows are added (with larger > >> event_time values), QDT will only fetch the rows whose event_time is > >> greater than the largest one it's seen. Then it updates its "largest > >> seen value" and so on. > >> > >> GenerateTableFetch (GTF) is another option, it works in a similar > >> fashion, except that it does not fetch the rows itself, instead it > >> generates flow files containing SQL statements that you can send > >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. > >> GTF is often used in place of QDT if you'll be fetching a large number > >> of rows in each statement, as you can distribute the SQL flow files > >> among the nodes in a cluster, to do the fetch in parallel. > >> > >> Regards, > >> Matt > >> > >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: > >> > > >> > Hi, > >> > i have "event_time" field in SQLite database that means epoch time > for triggering of external event. What processor(s) can i use to implement > schedule monitoring/ execution based on change in "event_time" value? > >> > Thanks, >
Re: schedule execution from relational database?
Hi Matt, Nifi does handle other parts of it, just different process group. Regards, Victor On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess wrote: > Victor, > > Yes, both QDT and GTF would generate something like "SELECT * from > myTable where event_time > X", and QDT will execute it and update X. > So if event_time is always increasing, it will continue to pick up the > same row(s). > > That's a curious use case, maybe NiFi could handle other parts of it > so you wouldn't need to update a single row in an external database > table? > > Regards, > Matt > > On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: > > > > What if have only one row and update the values in it? Will QDT fetch > updates? > > Thank you, > > Victor > > > > > > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess > wrote: > >> > >> You can use QueryDatabaseTable (QDT) for this, you'd set your > >> "event_time" column as the "Maximum Value Column(s)" property in the > >> processor. The first time QDT executes, it will fetch all the rows > >> (since it has not seen event_time before), then it will keep track of > >> the largest value of event_time. As new rows are added (with larger > >> event_time values), QDT will only fetch the rows whose event_time is > >> greater than the largest one it's seen. Then it updates its "largest > >> seen value" and so on. > >> > >> GenerateTableFetch (GTF) is another option, it works in a similar > >> fashion, except that it does not fetch the rows itself, instead it > >> generates flow files containing SQL statements that you can send > >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. > >> GTF is often used in place of QDT if you'll be fetching a large number > >> of rows in each statement, as you can distribute the SQL flow files > >> among the nodes in a cluster, to do the fetch in parallel. > >> > >> Regards, > >> Matt > >> > >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: > >> > > >> > Hi, > >> > i have "event_time" field in SQLite database that means epoch time > for triggering of external event. What processor(s) can i use to implement > schedule monitoring/ execution based on change in "event_time" value? > >> > Thanks, >
Re: schedule execution from relational database?
Victor, Yes, both QDT and GTF would generate something like "SELECT * from myTable where event_time > X", and QDT will execute it and update X. So if event_time is always increasing, it will continue to pick up the same row(s). That's a curious use case, maybe NiFi could handle other parts of it so you wouldn't need to update a single row in an external database table? Regards, Matt On Mon, Oct 29, 2018 at 12:36 PM l vic wrote: > > What if have only one row and update the values in it? Will QDT fetch updates? > Thank you, > Victor > > > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess wrote: >> >> You can use QueryDatabaseTable (QDT) for this, you'd set your >> "event_time" column as the "Maximum Value Column(s)" property in the >> processor. The first time QDT executes, it will fetch all the rows >> (since it has not seen event_time before), then it will keep track of >> the largest value of event_time. As new rows are added (with larger >> event_time values), QDT will only fetch the rows whose event_time is >> greater than the largest one it's seen. Then it updates its "largest >> seen value" and so on. >> >> GenerateTableFetch (GTF) is another option, it works in a similar >> fashion, except that it does not fetch the rows itself, instead it >> generates flow files containing SQL statements that you can send >> downstream to perhaps ExecuteSQL in order to actually fetch the rows. >> GTF is often used in place of QDT if you'll be fetching a large number >> of rows in each statement, as you can distribute the SQL flow files >> among the nodes in a cluster, to do the fetch in parallel. >> >> Regards, >> Matt >> >> On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: >> > >> > Hi, >> > i have "event_time" field in SQLite database that means epoch time for >> > triggering of external event. What processor(s) can i use to implement >> > schedule monitoring/ execution based on change in "event_time" value? >> > Thanks,
Re: schedule execution from relational database?
What if have only one row and update the values in it? Will QDT fetch updates? Thank you, Victor On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess wrote: > You can use QueryDatabaseTable (QDT) for this, you'd set your > "event_time" column as the "Maximum Value Column(s)" property in the > processor. The first time QDT executes, it will fetch all the rows > (since it has not seen event_time before), then it will keep track of > the largest value of event_time. As new rows are added (with larger > event_time values), QDT will only fetch the rows whose event_time is > greater than the largest one it's seen. Then it updates its "largest > seen value" and so on. > > GenerateTableFetch (GTF) is another option, it works in a similar > fashion, except that it does not fetch the rows itself, instead it > generates flow files containing SQL statements that you can send > downstream to perhaps ExecuteSQL in order to actually fetch the rows. > GTF is often used in place of QDT if you'll be fetching a large number > of rows in each statement, as you can distribute the SQL flow files > among the nodes in a cluster, to do the fetch in parallel. > > Regards, > Matt > > On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: > > > > Hi, > > i have "event_time" field in SQLite database that means epoch time for > triggering of external event. What processor(s) can i use to implement > schedule monitoring/ execution based on change in "event_time" value? > > Thanks, >
Re: schedule execution from relational database?
You can use QueryDatabaseTable (QDT) for this, you'd set your "event_time" column as the "Maximum Value Column(s)" property in the processor. The first time QDT executes, it will fetch all the rows (since it has not seen event_time before), then it will keep track of the largest value of event_time. As new rows are added (with larger event_time values), QDT will only fetch the rows whose event_time is greater than the largest one it's seen. Then it updates its "largest seen value" and so on. GenerateTableFetch (GTF) is another option, it works in a similar fashion, except that it does not fetch the rows itself, instead it generates flow files containing SQL statements that you can send downstream to perhaps ExecuteSQL in order to actually fetch the rows. GTF is often used in place of QDT if you'll be fetching a large number of rows in each statement, as you can distribute the SQL flow files among the nodes in a cluster, to do the fetch in parallel. Regards, Matt On Mon, Oct 29, 2018 at 11:13 AM l vic wrote: > > Hi, > i have "event_time" field in SQLite database that means epoch time for > triggering of external event. What processor(s) can i use to implement > schedule monitoring/ execution based on change in "event_time" value? > Thanks,
schedule execution from relational database?
Hi, i have "event_time" field in SQLite database that means epoch time for triggering of external event. What processor(s) can i use to implement schedule monitoring/ execution based on change in "event_time" value? Thanks,