This was submitted as NIFI-3422 and PR 1458. Thanks,
Naz Irizarry MITRE Corp. 617-893-0074 > On Jan 31, 2017, at 10:28 AM, Joe Witt <joe.w...@gmail.com> wrote: > > Hello > > You will first want to create a JIRA describing the work/idea being > done. Then in the commit log be sure to reference NIFI-XXXX. > > Take a look here for a helpful guide on how best to help the community > land contributions. > > https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide > > Thanks > Joe > > On Tue, Jan 31, 2017 at 10:17 AM, Irizarry Jr., Nazario <n...@mitre.org> > wrote: >> I am about to submit a PR for an implementation of the run-once scheduling. >> There is no outstanding JIRA ticket on this so what kind of NIFI-XXXX or >> other labeling should I put into the title of the PR? >> >> Thanks, >> >> Naz Irizarry >> MITRE Corp. >> 617-893-0074 >> >> >> >>> On Jan 12, 2017, at 3:55 PM, Irizarry Jr., Nazario <n...@mitre.org> wrote: >>> >>> I think it is a matter of the model in one's head. If one thinks of a >>> continuous activation paradigm the green arrow versus red square indicate >>> what you point out. On the other hand in an ad-hoc run-once paradigm the >>> green arrow is a nice succinct indicator of what has not run yet. In an >>> analytics environment processing can take minutes to hours for some >>> processors. As processing goes on the processors with the remaining green >>> arrows indicate what is left to complete in the “visual script.” >>> >>> Consider the following example. Say there there are five processors. The >>> first processor, say A, makes a query and gets data. Depending on what I >>> know about today’s input to A the output should be directed to B1, B2, B3, >>> or B4. The B's are actually variations on a particular analytic algorithm >>> and most of the time only one of them needs to be used. On one day (based >>> on external knowledge) I click on A and B1 and then the Start arrow. On >>> another day I modify the query, click on A and B2 and then click on the >>> Start arrow. etc, Clearly I could have four flows and I could start/stop >>> entire flows. But, as the number of processing stages increases and the >>> number of processing alternatives increases at each stage the combinatorial >>> growth makes distinct flows painful to manage. Sometimes it is easier to >>> have one all encompassing flow and then allow the analyst to shift click >>> the portions they want to invoke for the next “run." >>> >>> >>> Naz Irizarry >>> MITRE Corp. >>> 617-893-0074 >>> >>> >>> >>>> On Jan 12, 2017, at 2:14 PM, Joe Witt <joe.w...@gmail.com> wrote: >>>> >>>> Naz >>>> >>>> The green arrow vs red square says "scheduled to execute" vs "not >>>> scheduled to execute". For most processors, such as those which take >>>> input flow files from a connection, even if they're scheduled to run >>>> they're not going to be executed unless there is work to do (data >>>> sitting in the queue) and space available (on all destination >>>> relationships). Because of this I'm suggesting to consider just >>>> leaving them all scheduled to execute even though they won't actually >>>> be doing anything most of the time. The stats on each component tell >>>> you how many times it was actually invoked and how much data it >>>> processed, etc.. So you'll see that they're not doing anything most >>>> of the time. >>>> >>>> You mentioned not wanting to have to do anything manual yet run once >>>> would be a manual construct, right? >>>> >>>> I dont mean to suggest I'm closed off to the idea of a run once >>>> concept I just really want to understand your use case better. >>>> >>>> Thanks >>>> Joe