[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866310#comment-15866310
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/1490


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Fix For: 1.2.0
>
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866304#comment-15866304
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/1490
  
Perfect,  merging to master, thanks @ijokarumawak 


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863238#comment-15863238
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/1490
  
As a side note, there's Notify processor which is designed to work together 
with Wait processor. Since there's an existing PR #1466 adding properties to 
Notify, I didn't touch it by this PR. Please check #1466 for Notify processor 
update.


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863215#comment-15863215
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/1490
  
Thanks for reviewing, @pvillard31 .
I've updated this PR:
- Rebased with the latest master, just in case
- Added display name and name
- Corrected indent (probably this is the minor styling issue..?)
Please let me know if there's anything else. Thanks!


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862406#comment-15862406
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/1490
  
Hi @ijokarumawak, thanks for this improvement, that's really nice. And 
thanks for the template, it was really useful to review this PR! I just have a 
minor styling issue and since this processor is not released yet: could you 
just update the properties to use both ``.name()`` and ``.displayName()``? 
Otherwise LGTM and I'll merge as soon as you let me know.


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862098#comment-15862098
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/1490
  
@basvank Thanks for confirming the change, that certainly make the review & 
merge process easier!


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861159#comment-15861159
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user basvank commented on the issue:

https://github.com/apache/nifi/pull/1490
  
@ijokarumawak I have tested your patch on a single node with 3 the same 
flows and it does exactly what I expected and required as explained in [my 
mailing list 
entry](http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863p903.html).
 I am not able to test in the cluster setting at this moment but I see no 
reason why it wouldn't work there.

Thanks for your effort!


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859459#comment-15859459
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/1490
  
Dear reviewer, 
an example flow template is available here:
https://gist.github.com/ijokarumawak/85a3d77297ea94614e9f3f2a9dabca67


> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-09 Thread Joseph Gresock (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859437#comment-15859437
 ] 

Joseph Gresock commented on NIFI-3452:
--

Wow, this is a really clever application of Wait/Notify.  Nice addition to 
Wait, as well.

> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859432#comment-15859432
 ] 

ASF GitHub Bot commented on NIFI-3452:
--

GitHub user ijokarumawak opened a pull request:

https://github.com/apache/nifi/pull/1490

NIFI-3452: Add Wait processor Wait Mode property

Ensure back-pressure is active until downstream processing completes.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [x] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [x] Is your initial contribution a single, squashed commit?

### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijokarumawak/nifi nifi-3452

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/1490.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1490


commit 8767b8bf98c83e0e84e68fb53c51255fec4ab084
Author: Koji Kawamura 
Date:   2017-02-09T12:30:43Z

NIFI-3452: Add Wait processor Wait Mode property

Ensure back-pressure is active until downstream processing completes.




> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -