Hi Mark,

Thanks for the insightful examples! It really clears many things!!
From stress testing perspective, it is worth running both storage and network 
stress in the same time for “distributed” system and monitoring/analyzing the 
system behavior.
I think Collectd is a good choice as you suggested.
In addition, as we discussed in the summit breakout, Bottlenecks could do the 
monitoring and act as load manager. We have also added this into project E 
release plan.
https://wiki.opnfv.org/display/bottlenecks/Bottlenecks+Release+Plan
I think it would be the best, we implement advanced stress test cases based on 
collaborative work!

One more question, for your intern project proposal 
https://wiki.opnfv.org/display/DEV/Intern+Project%3A+Search+for+Optimal+Cinder+Throughput,
what the different colored lines in the graph represent and are they related to 
the “run phase matrix”?

Thanks,
Gabriel

From: Beierl, Mark [mailto:mark.bei...@dell.com]
Sent: Friday, June 23, 2017 9:58 PM
To: Yuyang (Gabriel)
Cc: test-wg; opnfv-tech-discuss@lists.opnfv.org; morgan.richo...@orange.com; 
Jose Lausuch; ross.b.bratt...@intel.com; Cooper, Trevor; Gaoliang (kubi)
Subject: Re: Stress Test Cases for E Release


Hello, Gabriel.

The baseline to which to referred at the end of your message is also what I 
mean by recording numbers from VSPERF and StorPerf.  For StorPerf, I can use 
this as an example:

Run a random read/write workload with block size of 8k and queue depth of 4 
using twice as many VMs as there are Ceph Storage nodes.  Use the Throughput 
Bandwidth and Latency metrics for both read and write as the baseline.

For VSPERF, I think I should let Trevor or someone else specify the best 
baseline.  I think to have maximum impact on the StorPerf test it should 
attempt to saturate the network with data.

If we re-run the tests at the same time and the baseline throughput drops, or 
baseline latency increases for StorPerf, we have identified a network 
bottleneck.

Of course, we should monitor the hypervisor CPU, Disk and Network stats the 
whole time the tests are running.  Collectd might be a good tool for this.

Regards,
Mark

Mark Beierl
SW System Sr Principal Engineer
Dell EMC | Office of the CTO
mobile +1 613 314 8106<tel:1-613-314-8106>
mark.bei...@dell.com<mailto:mark.bei...@dell.com>

On Jun 22, 2017, at 22:30, Yuyang (Gabriel) 
<gabriel.yuy...@huawei.com<mailto:gabriel.yuy...@huawei.com>> wrote:

Hi Mark,

Thanks for the detailed explanation! Really appreciate it!!
As to  the test cases you mentioned about Run VSPERF/StorPerf and record 
number, what exactly the “number” here refers to?

Another thinking about “testing impact of CPU load on VSPERF/Ceph”: maybe we 
could set up baseline test first to see the CPU usage for an atom test, then we 
see if the CPU usage increases when we execute multiple atom tests.
As a result, we know the CPU requirements for a specified usage.

Just for your reference, it maybe not able to run a stress test at full system 
capacity regarding CPU, memory or network bandwidth.
This is because that first thing crashed during a stress test could come from 
the Message Queue handling mechanism, could be from the database, could be from 
the APP/tools and many other things.
This is another reason we should setup a baseline test first. So that we could 
observe the system behavior by smoothly increasing the load (or number of atom 
tests), find the breaking point and locate the root cause.

Best,
Gabriel

From: Beierl, Mark [mailto:mark.bei...@dell.com]
Sent: Friday, June 23, 2017 2:58 AM
To: Yuyang (Gabriel)
Cc: test-wg; morgan.richo...@orange.com<mailto:morgan.richo...@orange.com>; 
Jose Lausuch; ross.b.bratt...@intel.com<mailto:ross.b.bratt...@intel.com>; 
Cooper, Trevor; Gaoliang (kubi)
Subject: Re: Stress Test Cases for E Release

From StorPerf, I would like to add the following:


  *   Scale out until maximum throughput reached.  Getting to this via code is 
an outstanding intern project which has not received a lot of interest yet [1].
  *   Test of different Nova schedulers.  Default is (I think) to load all VMs 
onto one compute node until it is full, then move to next.  This is one type of 
stress that is put on the single compute node.  If we change to spread out VMs 
across all compute nodes first, this will stress Cinder more

I've mentioned this before, but I find it interesting, so I will add it here.  
I would like to see two or more stress tests running at the same time.  For 
example:


  *   Run VSPERF and record numbers
  *   Run StorPerf and record numbers
  *   Run both at the same time and compare numbers.  If they are radically 
different, we have an issue where the networking for the stack has not been 
architected properly.

Continuing in that same thought: what is the impact of CPU load on the vSwitch? 
 If I run QTIP CPU benchmark at the same time as VSPERF can I see a difference? 
 What about CPU impact on Ceph (QTIP and StorPerf).  The combinations of these 
different tests can yield very interesting results.  Is this what Bamboo/PNDA 
is eventually going to be looking at?

Exciting times for stress testing; I am looking forward to what we can 
accomplish!

[1] 
https://wiki.opnfv.org/display/DEV/Intern+Project%3A+Search+for+Optimal+Cinder+Throughput

Regards,
Mark

Mark Beierl
SW System Sr Principal Engineer
Dell EMC | Office of the CTO
mobile +1 613 314 8106<tel:1-613-314-8106>
mark.bei...@dell.com<mailto:mark.bei...@dell.com>

On Jun 22, 2017, at 02:08, Yuyang (Gabriel) 
<gabriel.yuy...@huawei.com<mailto:gabriel.yuy...@huawei.com>> wrote:

Hi Testing Community,

As discussed in stress test breakout during Beijing Summit, we should start to 
consider the stress test cases for E release now.
In Danube, We have implemented 2 basic stress test cases (baseline traffic and 
life-cycle for ping). Details could be found 
athttps://wiki.opnfv.org/display/bottlenecks/Sress+Testing+over+OPNFV+Platform
In E release, I am thinking about more advanced test cases.
Based on what I have known, we have 5 test cases planned and discussed already 
which are:
A.        https://etherpad.opnfv.org/p/yardstick_release_e
1.       Scale-out test
2.       Scale-up test
B.        
https://wiki.opnfv.org/display/bottlenecks/Sress+Testing+over+OPNFV+Platform
3.       baseline for CPU limit
4.       life-cycle events for throughput
5.       life-cycle events for CPU limit

Are there any test cases I have missed, especially for VSPerf and StorPerf?
As Morgan is about to report the plan of long duration POD to TSC, it could be 
better to have a reference test case list before we do that.

Best,
Gabriel

_______________________________________________
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

Reply via email to