[
https://issues.apache.org/jira/browse/NIFI-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014636#comment-18014636
]
Joe Witt edited comment on NIFI-14864 at 8/18/25 3:02 PM:
----------------------------------------------------------
Testing or using with run-once at all is not recommended and often may not even
work. It certainly is not an effective test for performance.
How large is your kafka partition lag when you start the tests?
How fast is new data being generated into Kafka?
I strongly recommend using new consumer group ids for each performance run and
setting offset starting point to 'earliest' instead of 'latest'.
If you want to really focus on performance testing how fast NiFi can pull from
kafka
1. setup a topic and fill it up with enough data that it should ideally take
nifi 10-20 minutes to process it. Then stop adding new data.
2. setup nifi with 'earliest' as the offset starting point and use a completely
new unused consumer group id
3. let ConsumeKafka run uninterrupted until it consumes all the data. Then
review performance information . There are a ton of stats you can see for this
in NiFi to understand read rates/write rates/latencies, etc..
The performance of this processor is in general quite good and it is very
battle tested including the absolute latest bits. Rates into the 100s of
MB/sec per nifi instance are observed routinely.
How many kafka partitions?
was (Author: joewitt):
Testing or using with run-once at all is not recommended and often may not even
work. It certainly is not an effective test for performance.
How large is your kafka partition lag when you start the tests?
How fast is new data being generated into Kafka?
You keep using 'latest' for the offset starting point so you're only going to
see data as fast as it arrives best case - none of the backlog.
If you want to really focus on performance testing how fast NiFi can pull from
kafka
1. setup a topic and fill it up with enough data that it should ideally take
nifi 10-20 minutes to process it. Then stop adding new data.
2. setup nifi with 'earliest' as the offset starting point and use a completely
new unused consumer group id
3. let ConsumeKafka run uninterrupted until it consumes all the data. Then
review performance information . There are a ton of stats you can see for this
in NiFi to understand read rates/write rates/latencies, etc..
The performance of this processor is in general quite good and it is very
battle tested including the absolute latest bits. Rates into the 100s of
MB/sec per nifi instance are observed routinely.
How many kafka partitions?
> ConsumeKafka performance
> ------------------------
>
> Key: NIFI-14864
> URL: https://issues.apache.org/jira/browse/NIFI-14864
> Project: Apache NiFi
> Issue Type: Bug
> Components: Configuration
> Affects Versions: 2.5.0
> Environment: nifi 2.5, kafka server 2.8
> Reporter: Zenkovac
> Priority: Major
>
> switching from nifi 1.19 to 2.5 and using ConsumeKafka cant get to consume
> flowfiles with more than ~500 records per flowfile despite having millions of
> messages available in kafka topic.
> This has a penalty performance for me because I consume thousands of
> flowfiles vs a few in nifi 1.19 which means less disc i/o usage.
> this is my config:
> *Processing Strategy: RECORD*
> *Max Uncommitted Time* 10 sec
--
This message was sent by Atlassian Jira
(v8.20.10#820010)