[ https://issues.apache.org/jira/browse/NIFI-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290071#comment-16290071 ]
ASF GitHub Bot commented on NIFI-4475: -------------------------------------- GitHub user JPercivall opened a pull request: https://github.com/apache/nifi/pull/2337 NIFI-4475 Changing the get(batchSize) method in StandardProcessSessio… …n so that it checks all connections before returning nothing Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [X] Has your PR been rebased against the latest commit within the target branch (typically master)? - [X] Is your initial contribution a single, squashed commit? ### For code changes: - [X] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [X] Have you written or updated unit tests to verify your changes? - [X] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [X] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [X] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [X] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [X] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JPercivall/nifi NIFI-4475_Making_session_get_x_round_robin_queues Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2337.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2337 ---- commit 5307d406374c5448d1913a3130459b6724104b10 Author: Joe Percivall <jperciv...@apache.org> Date: 2017-12-13T22:17:05Z NIFI-4475 Changing the get(batchSize) method in StandardProcessSession so that it checks all connections before returning nothing ---- > Processors that use session.get(batchsize) will yield if multiple inbound > connections exist where at least one connection is empty. > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: NIFI-4475 > URL: https://issues.apache.org/jira/browse/NIFI-4475 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Affects Versions: 1.3.0 > Reporter: Matthew Clarke > Labels: nifi > > There is a difference between how the NiFi framework handles batches of > incoming data (session.get(batchsize)) versus 1 FlowFile (Session.get()) at > a time. > For example PutSyslog does batches and putUDP processes 1 FlowFile at a time. > With the batch method, a thread is used to poll connection 1 and requests a > batch of FlowFiles. If it gets at least 1 FlowFile, it sends that > FlowFile(s) and ends that thread. On next thread it round-robins to the next > connection (Looped failure relationship for example) and requests a batch > again. If that connection is empty, the framework assumes there is no work > to do and yields the processor for the configured "yield duration". So > regardless of run schedule, this processor will not run again for the > configured yield duration. > With processors that only work on 1 FlowFile at a time. The thread will > round-robin all the inbound connections until it finds a FlowFile. If it > does not find a FlowFile in any connection the framework will yield the > processor for the configured yield duration. > The intent of yield duration is to keep processors with the default runs > schedule of 0 sec from using excessive CPU doing nothing; however, in the > case of batches it will yield even if FlowFiles exist on another connection. > This can have a huge impact on throughput performance of processors that use > session.get(batchsize) > There are two possible work-arounds to this issue: > 1. You should see improved performance when multiple inbound connections > exist (where any connection may be normally empty) by reducing the configured > yield duration. The result is better throughput but at the expense of more > CPU usage when all connections are truly empty. > 2. Only have one inbound connection to processor that work on batches. This > can be accomplished by using a funnel. -- This message was sent by Atlassian JIRA (v6.4.14#64029)