Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2241
  
    @harshach 
    
    Reiterating what @HeartSaVioR said about benchmarking.  Most benchmarking 
is done where you push a system to its limits and see what maximum throughput 
it can do.  This is far from what a real user wants.  It looks good for a 
vendor to brag about I can do X but that other vendor over there can only do Y. 
 But it is close to worthless for what real users want to know.
    
    Real users are trying to balance the cost of the system in $ (CPU time + 
memory used become this, how many EC whatever instances do I need), the amount 
of data that they can push through the system and how quickly they can get 
results back.  Each of these variables are reflected by this test.  In most 
cases a user has a set load that they know they get typically, and a reasonable 
guess at a maximum load that they expect to see.  Also most users have a 
deadline by which the data is no good any more, if not they should be using 
batch.  And a budget that they have to spend on this project, if not call me I 
want to work for you and my salary requirements are very reasonable.
    
    You need to give users tools to explore all three, and because the three 
are intertwined you want to be able to hold one or two of the variables 
constant while you measure the others.  Storm currently has no way to set a 
target SLA (I hope to add one eventually), but you can control the rate at 
which messages arrive and the parallelism of the topology, (which reflects the 
cost).  So the goal is to scan through various throughput values and various 
parallelisms to see what the latency is, and what resources are actually used.  
In the read world we would adjust the heap size and parallelism accordingly.
    
    Complaining about a benchmark creating 51 threads relates to the 
parallelism that we want to explore.  If that is what I did wrong in the 
benchmark I am happy to adjust and reevaluate.  I want to understand how the 
parallelism impacts this code.  The thing that concerns me now is that it 
appears that scaling a topology is very different now, and I want to understand 
exactly how that works.
    
    I cannot easily roll out a change to my customers saying things might get a 
lot better or they might get a lot worse.  We need to make it easy for a user 
with a topology that may not have been ideal (but worked well), to continue to 
work well.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to