Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2241
@harshach
Reiterating what @HeartSaVioR said about benchmarking. Most benchmarking
is done where you push a system to its limits and see what maximum throughput
it can do. This is far from what a real user wants. It looks good for a
vendor to brag about I can do X but that other vendor over there can only do Y.
But it is close to worthless for what real users want to know.
Real users are trying to balance the cost of the system in $ (CPU time +
memory used become this, how many EC whatever instances do I need), the amount
of data that they can push through the system and how quickly they can get
results back. Each of these variables are reflected by this test. In most
cases a user has a set load that they know they get typically, and a reasonable
guess at a maximum load that they expect to see. Also most users have a
deadline by which the data is no good any more, if not they should be using
batch. And a budget that they have to spend on this project, if not call me I
want to work for you and my salary requirements are very reasonable.
You need to give users tools to explore all three, and because the three
are intertwined you want to be able to hold one or two of the variables
constant while you measure the others. Storm currently has no way to set a
target SLA (I hope to add one eventually), but you can control the rate at
which messages arrive and the parallelism of the topology, (which reflects the
cost). So the goal is to scan through various throughput values and various
parallelisms to see what the latency is, and what resources are actually used.
In the read world we would adjust the heap size and parallelism accordingly.
Complaining about a benchmark creating 51 threads relates to the
parallelism that we want to explore. If that is what I did wrong in the
benchmark I am happy to adjust and reevaluate. I want to understand how the
parallelism impacts this code. The thing that concerns me now is that it
appears that scaling a topology is very different now, and I want to understand
exactly how that works.
I cannot easily roll out a change to my customers saying things might get a
lot better or they might get a lot worse. We need to make it easy for a user
with a topology that may not have been ideal (but worked well), to continue to
work well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---