Re: [DISCUSS] Run Once scheduling

Irizarry Jr., Nazario Mon, 06 Feb 2017 07:01:29 -0800

This was submitted as NIFI-3422 and PR 1458.

Thanks,


Naz Irizarry
MITRE Corp.
617-893-0074



> On Jan 31, 2017, at 10:28 AM, Joe Witt <joe.w...@gmail.com> wrote:
> 
> Hello
> 
> You will first want to create a JIRA describing the work/idea being
> done.  Then in the commit log be sure to reference NIFI-XXXX.
> 
> Take a look here for a helpful guide on how best to help the community
> land contributions.
> 
> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide
> 
> Thanks
> Joe
> 
> On Tue, Jan 31, 2017 at 10:17 AM, Irizarry Jr., Nazario <n...@mitre.org> 
> wrote:
>> I am about to submit a PR for an implementation of the run-once scheduling.  
>> There is no outstanding JIRA ticket on this so what kind of NIFI-XXXX or 
>> other labeling should I put into the title of the PR?
>> 
>> Thanks,
>> 
>> Naz Irizarry
>> MITRE Corp.
>> 617-893-0074
>> 
>> 
>> 
>>> On Jan 12, 2017, at 3:55 PM, Irizarry Jr., Nazario <n...@mitre.org> wrote:
>>> 
>>> I think it is a matter of the model in one's head.  If one thinks of a 
>>> continuous activation paradigm the green arrow versus red square indicate 
>>> what you point out.  On the other hand in an ad-hoc run-once paradigm the 
>>> green arrow is a nice succinct indicator of what has not run yet.  In an 
>>> analytics environment processing can take minutes to hours for some 
>>> processors.  As  processing goes on the processors with the remaining green 
>>> arrows indicate what is left to complete in the “visual script.”
>>> 
>>> Consider the following example. Say there there are five processors. The 
>>> first processor, say A, makes a query and gets data.  Depending on what I 
>>> know about today’s input to A the output should be directed to B1, B2, B3, 
>>> or B4.  The B's are actually variations on a particular analytic algorithm 
>>> and most of the time only one of them needs to be used.  On one day (based 
>>> on external knowledge) I click on A and B1 and then the Start arrow.  On 
>>> another day I modify the query, click on A and B2 and then click on the 
>>> Start arrow.  etc, Clearly I could have four flows and I could start/stop 
>>> entire flows.  But, as the number of processing stages increases and the 
>>> number of processing alternatives increases at each stage the combinatorial 
>>> growth makes distinct flows painful to manage.  Sometimes it is easier to 
>>> have one all encompassing flow and then allow the analyst to shift click 
>>> the portions they want to invoke for the next “run."
>>> 
>>> 
>>> Naz Irizarry
>>> MITRE Corp.
>>> 617-893-0074
>>> 
>>> 
>>> 
>>>> On Jan 12, 2017, at 2:14 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>>> 
>>>> Naz
>>>> 
>>>> The green arrow vs red square says "scheduled to execute" vs "not
>>>> scheduled to execute".  For most processors, such as those which take
>>>> input flow files from a connection, even if they're scheduled to run
>>>> they're not going to be executed unless there is work to do (data
>>>> sitting in the queue) and space available (on all destination
>>>> relationships).  Because of this I'm suggesting to consider just
>>>> leaving them all scheduled to execute even though they won't actually
>>>> be doing anything most of the time.  The stats on each component tell
>>>> you how many times it was actually invoked and how much data it
>>>> processed, etc..  So you'll see that they're not doing anything most
>>>> of the time.
>>>> 
>>>> You mentioned not wanting to have to do anything manual yet run once
>>>> would be a manual construct, right?
>>>> 
>>>> I dont mean to suggest I'm closed off to the idea of a run once
>>>> concept I just really want to understand your use case better.
>>>> 
>>>> Thanks
>>>> Joe

Re: [DISCUSS] Run Once scheduling

Reply via email to