[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866310#comment-15866310 ] ASF GitHub Bot commented on NIFI-3452: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/1490 > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Fix For: 1.2.0 > > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866304#comment-15866304 ] ASF GitHub Bot commented on NIFI-3452: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/1490 Perfect, merging to master, thanks @ijokarumawak > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863238#comment-15863238 ] ASF GitHub Bot commented on NIFI-3452: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1490 As a side note, there's Notify processor which is designed to work together with Wait processor. Since there's an existing PR #1466 adding properties to Notify, I didn't touch it by this PR. Please check #1466 for Notify processor update. > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863215#comment-15863215 ] ASF GitHub Bot commented on NIFI-3452: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1490 Thanks for reviewing, @pvillard31 . I've updated this PR: - Rebased with the latest master, just in case - Added display name and name - Corrected indent (probably this is the minor styling issue..?) Please let me know if there's anything else. Thanks! > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862406#comment-15862406 ] ASF GitHub Bot commented on NIFI-3452: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/1490 Hi @ijokarumawak, thanks for this improvement, that's really nice. And thanks for the template, it was really useful to review this PR! I just have a minor styling issue and since this processor is not released yet: could you just update the properties to use both ``.name()`` and ``.displayName()``? Otherwise LGTM and I'll merge as soon as you let me know. > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862098#comment-15862098 ] ASF GitHub Bot commented on NIFI-3452: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1490 @basvank Thanks for confirming the change, that certainly make the review & merge process easier! > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861159#comment-15861159 ] ASF GitHub Bot commented on NIFI-3452: -- Github user basvank commented on the issue: https://github.com/apache/nifi/pull/1490 @ijokarumawak I have tested your patch on a single node with 3 the same flows and it does exactly what I expected and required as explained in [my mailing list entry](http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863p903.html). I am not able to test in the cluster setting at this moment but I see no reason why it wouldn't work there. Thanks for your effort! > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859459#comment-15859459 ] ASF GitHub Bot commented on NIFI-3452: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1490 Dear reviewer, an example flow template is available here: https://gist.github.com/ijokarumawak/85a3d77297ea94614e9f3f2a9dabca67 > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859437#comment-15859437 ] Joseph Gresock commented on NIFI-3452: -- Wow, this is a really clever application of Wait/Notify. Nice addition to Wait, as well. > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > -- - - > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property
[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859432#comment-15859432 ] ASF GitHub Bot commented on NIFI-3452: -- GitHub user ijokarumawak opened a pull request: https://github.com/apache/nifi/pull/1490 NIFI-3452: Add Wait processor Wait Mode property Ensure back-pressure is active until downstream processing completes. Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [x] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [x] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijokarumawak/nifi nifi-3452 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/1490.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1490 commit 8767b8bf98c83e0e84e68fb53c51255fec4ab084 Author: Koji KawamuraDate: 2017-02-09T12:30:43Z NIFI-3452: Add Wait processor Wait Mode property Ensure back-pressure is active until downstream processing completes. > Add Wait processor Wait Mode property > - > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.2.0 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > -- - - > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > -- - - > -- - - > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > -- - - > Case 2: > -- - - > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > -- - - > -- -