[ https://issues.apache.org/jira/browse/NIFI-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Kawamura updated NIFI-3452: -------------------------------- Attachment: wait-for-a-par-of-flow-to-finish.png > Add Wait processor Wait Mode property > ------------------------------------- > > Key: NIFI-3452 > URL: https://issues.apache.org/jira/browse/NIFI-3452 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Affects Versions: 1.2.0 > Reporter: Koji Kawamura > Assignee: Koji Kawamura > Attachments: wait-for-a-par-of-flow-to-finish.png > > > NiFi back pressure is handled per relationship and as long as a relationship > has room to receive more flow files, source processor is scheduled to run. > However, this behavior is not ideal in some cases. For example, when there is > very computationally expensive task and user wants to limit the number of > FlowFiles can be processed at a given time, it's not always possible to limit > the rate by existing RateControl nor back-pressure mechanism. > As a more practical example, in the following flow, it's expected the GetSQS > is triggered only when the previous FlowFile has been processed completely. > Node 1 is parsing a flow file (indicated by the X in the connection between > FetchS3Object and Parse). Both connections have a back-pressure threshold of > 1, but because the object is already fetched, the first connection is empty > and can thus be filled. This means that, if a new item becomes available in > the queue, both of the following cases can happen with equal probability: > {code} > Case 1: > ---------- ----------------- --------- > Node 1: | GetSQS | -X-> | FetchS3Object | -X-> | Parse | > ---------- ----------------- --------- > ---------- ----------------- --------- > Node 2: | GetSQS | ---> | FetchS3Object | ---> | Parse | > ---------- ----------------- --------- > Case 2: > ---------- ----------------- --------- > Node 1: | GetSQS | ---> | FetchS3Object | -X-> | Parse | > ---------- ----------------- --------- > ---------- ----------------- --------- > Node 2: | GetSQS | -X-> | FetchS3Object | ---> | Parse | > ---------- ----------------- --------- > {code} > To achieve that, we could improve Wait processor as follows. > NiFi scheduler checks downstream relationship availability, when it's full, > the processor won't be scheduled to run. In case a source processor has > multiple outgoing relationships, and if ANY of those is full, the processor > won't be scheduled. > (This is how processor scheduling works with back-pressure, but can > alter with @TriggerWhenAnyDestinationAvailable annotation. DistributeLoad is > the only processor annotated with this) > We could use this mechanism to keep the source processor waiting to be > scheduled, by following flow: > {code} > GetSQS > -- success --> FetchS3Object --> Parse --> Notify > -- success --> Wait > {code} > To make it work as expected, we need to improve Wait so that user can choose > how waiting FlowFile is handled, from either: > "Route to 'wait' relationship" or "Keep in the Upstream connection". > Currently it has only option to route to 'wait'. > Use "Keep in the Upstream connection" Wait mode with the flow above, > the incoming flow file in GetSQS -> Wait connection stays there until actual > data processing finishes and Notify sends a notification signal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)