Philippe Bossut wrote: > I think the point made my Bryan and Katie (and I have to agree with > them) is that there's no way to see at a glance if we reached acceptable > performance or not. The color code currently used is misleading (e.g. > why is the Linux perf of importing a 3000 events calendar green? it's > far from acceptable, if it's just to see that it is better than 0.5, > it's easy to see at a glance...)
Ok, here's my latest proposal:
|0.6 |Windows (r 7503) |OSX (r 7500) | Linux col
Test|Target| |std | |std | here
| |time |d %|d t |dev |time |d %|d t |dev |
===========================================================
#1 | 10 s |9.94s |-2%|-0.02s|0.01s|18.2s|-1%|-0.02s|0.04s|
#2 | 1 s |1.14s |0% |0s |0.00s|2.24s|0% |0s |0.01s|
...
[Previous results][Help]
* The first column is the test short test description. Where should the
link lead?
* The second column is the 0.6 target time.
* The 3-6 columns are for Windows results, next 4 for Mac, and last 4
for Linux (omitted here)
** Top row says the platform, and from which revision these numbers are from
** 2-3 rows are the actual column headers: time, delta %, delta time,
standard deviation. The deltas are compared to the last measurement on
that platform. The std deviation is there to let you know how likely the
change was just random noise - if less than std dev it almost certainly
was noise (but if it starts staying consistently at the new value then
it was a real change).
I think it is crucial to report the difference to the previous
measurements, because whenever you check in, you should check if your
checkin made a difference compared to the previous results.
Likewise, since in my opinion the most critical piece of information
here is the trend we are making, the most noticeable coloring should
happen based on the deltas to the previous run. If you made it slower
(something above std dev limit), it should show up as either orange or
red cell background depending on how bad it was. If it got noticeably
better, it should show green background. If the change was within std
dev, don't color because we don't know if it is real change or noise.
Now I could see maybe drawing colored text that would indicate how the
actual measured time compares to the target. If the measured time is
below target, green. Slightly above, orange, and if way above, red.
The color thresholds would be up for debate. std dev plays a role, but
currently std dev is very small so it shouldn't matter much. I think it
should be orange as soon as it is noticeably (more than std dev) on the
worse side. Red... hmm, I'd like to put that pretty low for deltas at
least, like 10% change for the worse and you put the tree on fire. Btw,
we would need to decide also what to do if that happens/in what
situations it is acceptable to make perf worse.
The previous results link would open a new page which would have the
latest + n number of previous tables stacked as a history on the page.
At a later date it could contain the perf trend graph as well. Help link
would open some docs on how to read the results, how tests are run etc.
See attachment for html mockup.
--
Heikki Toivonen
Title: tbox sample
| Test | 0.6 Target | Windows (r 7503) | OSX (r 7502) | Linux (r 7503) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| time | Δ % | Δ time | std dev | time | Δ % | Δ time | std dev | time | Δ % | Δ time | std dev | ||
| #1 startup | 10 s | 9.94s | -2% | -0.02 | 0.01s | 18.2s | -1% | -0.02s | 0.06s | 7.86s | 0% | 0s | 0.00s |
| #2 new event (menu) | 1 s | 1.14s | 0% | 0s | 0.00s | 2.22s | +50% | +1.24s | 0.02s | 0.986s | +5% | +0.06s | 0.01s |
signature.asc
Description: OpenPGP digital signature
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Open Source Applications Foundation "Dev" mailing list http://lists.osafoundation.org/mailman/listinfo/dev
