Scott Hendricks created KAFKA-13004:
---------------------------------------
Summary: Trogdor performance decreases sharply with large amounts
of tasks.
Key: KAFKA-13004
URL: https://issues.apache.org/jira/browse/KAFKA-13004
Project: Kafka
Issue Type: Bug
Components: tools
Environment: We run our Trogdor clusters within Kubernetes.
Reporter: Scott Hendricks
Assignee: Scott Hendricks
As part of my performance tests, I am running 3000 workloads within Trogdor.
The clients seem to be able to handle this fine, but when I go to reset and run
the same test again, Trogdor seems sluggish.
Here are the steps to reproduce this:
# Run 3000 workloads in Trogdor, a combination of Produce/Consume workloads.
# Wait for the workloads to complete.
# Run the DELETE API calls to destroy all 3000 workloads to reset for the next
run.
# Confirm via the API that there are no workloads defined in the system.
# Run an additional 3000 workloads in Trogdor similar to step 1.
The Coordinator takes a long time to start the second batch of 3000. There
seems to be some performance issue in the framework that will take a while to
debug. At this point I don't know if it only affects the Coordinator, or if the
Agents are affected as well. I do not currently have the time to look into
this, so I am creating this issue to track it.
The workaround I am employing is destroying and recreating the Trogdor cluster
in between test runs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)