[ 
https://issues.apache.org/jira/browse/KAFKA-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753806#comment-15753806
 ] 

Juan Chorro commented on KAFKA-4474:
------------------------------------

Hi [~enothereska],

We have set up 6 independent physical machines each running VMWare and an 
instance of the kafka cluster. The HDD while not very fast, is directly 
attached to the host.

Each instance has the following reserved resources: 8GB of RAM, 30GB of HDD 
(directly attached) and 4 cores.

While other services might run in the same host, their load is negligible. 

We have found inconsistent results when doing some basic benchmark of Kafka 
Streams. This weird results don’t appear when doing the same test using 
directly Kafka API.

There is an external (7th system) synthetic Kafka producer. It is able to 
generate around 100K events per second.

The weirdness of the results is not in the achieved numbers themselves, but 
that in some cases the results are extremely underperformance and different 
from each other (where are those events?)

You can see performance test results table attached in attachments section.

Some remarks from this results:

In KS tests A, B, C and D there is a big discrepancy between input and output 
values
In all Pure Kafka numbers are equal, but A and C are underperformance
In KS tests B and D input rate is way above the generated load !!!

More details on the KS tests: 
https://docs.google.com/spreadsheets/d/1miMx1XzajYxhWdntdsUCat2Efu5UWV-L5P4mAa234M0/edit?usp=sharing

More details on pure Kafka tests:
https://docs.google.com/spreadsheets/d/1jDvjRQKAZsliOB5RRpkYZINnl_DpYeTL6iYNCVVpIGk/edit?usp=sharing


> Poor kafka-streams throughput
> -----------------------------
>
>                 Key: KAFKA-4474
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4474
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.0
>            Reporter: Juan Chorro
>            Assignee: Eno Thereska
>         Attachments: Performance test results.png, hctop sreenshot.png
>
>
> Hi! 
> I'm writing because I have a worry about kafka-streams throughput.
> I have only a kafka-streams application instance that consumes from 'input' 
> topic, prints on the screen and produces in 'output' topic. All topics have 4 
> partitions. As can be observed the topology is very simple.
> I produce 120K messages/second to 'input' topic, when I measure the 'output' 
> topic I detect that I'm receiving ~4K messages/second. I had next 
> configuration (Remaining parameters by default):
> application.id: myApp
> bootstrap.servers: localhost:9092
> zookeeper.connect: localhost:2181
> num.stream.threads: 1
> I was doing proofs and tests without success, but when I created a new 
> 'input' topic with 1 partition (Maintain 'output' topic with 4 partitions) I 
> got in 'output' topic 120K messages/seconds.
> I have been doing some performance tests and proof with next cases (All 
> topics have 4 partitions in all cases):
> Case A - 1 Instance:
> - With num.stream.threads set to 1 I had ~3785 messages/second
> - With num.stream.threads set to 2 I had ~3938 messages/second
> - With num.stream.threads set to 4 I had ~120K messages/second
> Case B - 2 Instances:
> - With num.stream.threads set to 1 I had ~3930 messages/second for each 
> instance (And throughput ~8K messages/second)
> - With num.stream.threads set to 2 I had ~3945 messages/second for each 
> instance (And more or less same throughput that with num.stream.threads set 
> to 1)
> Case C - 4 Instances
> - With num.stream.threads set to 1 I had 3946 messages/seconds for each 
> instance (And throughput ~17K messages/second):
> As can be observed when num.stream.threads is set to #partitions I have best 
> results. Then I have next questions:
> - Why whether I have a topic with #partitions > 1 and with 
> num.streams.threads is set to 1 I have ~4K messages/second always?
> - In case C. 4 instances with num.stream.threads set to 1 should be better 
> that 1 instance with num.stream.threads set to 4. Is corrects this 
> supposition?
> This is the kafka-streams application that I use: 
> https://gist.github.com/Chorro/5522ec4acd1a005eb8c9663da86f5a18



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to