[ 
https://issues.apache.org/jira/browse/NIFI-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290071#comment-16290071
 ] 

ASF GitHub Bot commented on NIFI-4475:
--------------------------------------

GitHub user JPercivall opened a pull request:

    https://github.com/apache/nifi/pull/2337

    NIFI-4475 Changing the get(batchSize) method in StandardProcessSessio…

    …n so that it checks all connections before returning nothing
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [X] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [X] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    - [X] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [X] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
    - [X] Have you written or updated unit tests to verify your changes?
    - [X] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [X] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
    - [X] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
    - [X] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [X] Have you ensured that format looks appropriate for the output in 
which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JPercivall/nifi 
NIFI-4475_Making_session_get_x_round_robin_queues

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2337
    
----
commit 5307d406374c5448d1913a3130459b6724104b10
Author: Joe Percivall <jperciv...@apache.org>
Date:   2017-12-13T22:17:05Z

    NIFI-4475 Changing the get(batchSize) method in StandardProcessSession so 
that it checks all connections before returning nothing

----


> Processors that use session.get(batchsize) will yield if multiple inbound 
> connections exist where at least one connection is empty.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4475
>                 URL: https://issues.apache.org/jira/browse/NIFI-4475
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.3.0
>            Reporter: Matthew Clarke
>              Labels: nifi
>
> There is a difference between how the NiFi framework handles batches of 
> incoming data  (session.get(batchsize)) versus 1 FlowFile (Session.get()) at 
> a time.
> For example PutSyslog does batches and putUDP processes 1 FlowFile at a time.
> With the batch method, a thread is used to poll connection 1 and requests a 
> batch of FlowFiles.  If it gets at least 1 FlowFile, it sends that 
> FlowFile(s) and ends that thread.  On next thread it round-robins to the next 
> connection (Looped failure relationship for example) and requests a batch 
> again.  If that connection is empty, the framework assumes there is no work 
> to do and yields the processor for the configured "yield duration".  So 
> regardless of run schedule, this processor will not run again for the 
> configured yield duration.
> With processors that only work on 1 FlowFile at a time. The thread will 
> round-robin all the inbound connections until it finds a FlowFile.  If it 
> does not find a FlowFile in any connection the framework will yield the 
> processor for the configured yield duration.
> The intent of yield duration is to keep processors with the default runs 
> schedule of 0 sec from using excessive CPU doing nothing; however, in the 
> case of batches it will yield even if FlowFiles exist on another connection.  
> This can have a huge impact on throughput performance of processors that use 
> session.get(batchsize)
> There are two possible work-arounds to this issue:
> 1. You should see improved performance when multiple inbound connections 
> exist (where any connection may be normally empty) by reducing the configured 
> yield duration. The result is better throughput but at the expense of more 
> CPU usage when all connections are truly empty.
> 2. Only have one inbound connection to processor that work on batches. This 
> can be accomplished by using a funnel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to