[jira] [Updated] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-14 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-3452:
-
   Resolution: Fixed
Fix Version/s: 1.2.0
   Status: Resolved  (was: Patch Available)

> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Fix For: 1.2.0
>
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-09 Thread Koji Kawamura (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-3452:

Status: Patch Available  (was: In Progress)

> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (NIFI-3452) Add Wait processor Wait Mode property

2017-02-09 Thread Koji Kawamura (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-3452:

Attachment: wait-for-a-par-of-flow-to-finish.png

> Add Wait processor Wait Mode property
> -
>
> Key: NIFI-3452
> URL: https://issues.apache.org/jira/browse/NIFI-3452
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Koji Kawamura
>Assignee: Koji Kawamura
> Attachments: wait-for-a-par-of-flow-to-finish.png
>
>
> NiFi back pressure is handled per relationship and as long as a relationship 
> has room to receive more flow files, source processor is scheduled to run.
> However, this behavior is not ideal in some cases. For example, when there is 
> very computationally expensive task and user wants to limit the number of 
> FlowFiles can be processed at a given time, it's not always possible to limit 
> the rate by existing RateControl nor back-pressure mechanism.
> As a more practical example, in the following flow, it's expected the GetSQS 
> is triggered only when the previous FlowFile has been processed completely.
> Node 1 is parsing a flow file (indicated by the X in the connection between 
> FetchS3Object and Parse). Both connections have a back-pressure threshold of 
> 1, but because the object is already fetched, the first connection is empty 
> and can thus be filled. This means that, if a new item becomes available in 
> the queue, both of the following cases can happen with equal probability:
> {code}
> Case 1:
> --  -  -
> Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse |
> --  -  -
> Case 2:
> --  -  -
> Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse |
> --  -  -
> --  -  -
> Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse |
> --  -  -
> {code}
> To achieve that, we could improve Wait processor as follows.
> NiFi scheduler checks downstream relationship availability, when it's full, 
> the processor won't be scheduled to run. In case a source processor has 
> multiple outgoing relationships, and if ANY of those is full, the processor 
> won't be scheduled.
> (This is how processor scheduling works with back-pressure, but can
> alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is 
> the only processor annotated with this)
> We could use this mechanism to keep the source processor waiting to be 
> scheduled, by following flow:
> {code}
> GetSQS
>   -- success --> FetchS3Object --> Parse --> Notify
>   -- success --> Wait
> {code}
> To make it work as expected, we need to improve Wait so that user can choose 
> how waiting FlowFile is handled, from either:
> "Route to 'wait' relationship" or "Keep in the Upstream connection".
> Currently it has only option to route to 'wait'.
> Use "Keep in the Upstream connection" Wait mode with the flow above,
> the incoming flow file in GetSQS -> Wait connection stays there until actual 
> data processing finishes and Notify sends a notification signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)