henrikingo commented on issue #137:
URL: https://github.com/apache/otava/issues/137#issuecomment-4008759836

   @ligurio Thanks for using otava and being active here. Getting more input is 
going to be crucial in the next 6 - 12 months when for the first time ever we 
work on a common upstream and make some UX changes and improvements. If you're 
not already, I invite you to [join the mailing 
list](https://otava.apache.org/docs/community) where we occasionally have such 
conversations too...
   
   A couple of comments on your workflow:
   
   In every place I've seen Otava used (and used myself), it was used within 
some larger web based UI. Unfortunately none of these were open sourced, now 
are they even publicly available for you to view, although maybe the Datastax 
way of doing this is to a large extent available in Otava. Namely, in Datastax, 
benchmark results would be submitted to Prometheus and the associated Grafana 
dashboard. Otava then has a functionality to read data from Prometheus, compute 
change points, and write them back into Prometheus as Grafana annotations. 
(Over time, other databases have been added.)
   
   In MongoDB we had performance graphs directly in the CI system (very similar 
to what you might have in Jenkins, for example) and also Jira integration so 
that we could create a Jira issue directly from the change point alert in CI. 
Further, this included a feature using custom Jira fields, so that when a 
regression was eventually fixed, the commit sha of the fix would be added to 
the same jira, so that the regression and the fix were paired together. The CI 
graphs would get this information in Jira, and fixed change points could be 
presented with a less urgent color than the ones not addressed. Same for 
processing false positives. Was another click of a button.
   
   But speaking more generally, I always advocate to store your test results 
and compute change points outside and after the actual test run. There are 
several reasons:
   
   - First of all, Otava isn't designed to find a change point immediately 
after the test. In many cases it might flag a change only after 4-5 more tests 
were run. It is only at that point you can be certain that the change really 
was persistant and not some random up or down.
   - As always with "big data", you may want to rerun the analytics part on the 
same data that you already have. For example you may want to change the p-value 
or some other parameter. Or fix  a bug... You want to be able to do this 
without having to rerun the actual benchmarks.
   - If you analyze the 30 most recent points inside your workflow, and there 
is an actual regression, now you will be alerted of the same regression 30 
times, no? If you are saying you only alert / fail the job if it is the most 
recent point that is the change point, then re-read the first  point.
   - And if you have two change points within the 30 day window, how would you 
notice the second one if the job is already failing because of the first change 
point?
   
   Like I said, unfortunately I'm not aware of any of the Otava based 
dashboards to be publicly available. Nyrkiö is a commercial SaaS offering 
providing this same type of graphing plus integration with github pull requests 
and issues. (I'm unsure about the etiquette here, but it seemed on topic to 
mention it in this case.) Here is  a random example of a pull request comment 
from Nyrkiö about one benchmark result being significantly slower than before. 
https://github.com/nyrkio/nyrkio/pull/968#issuecomment-3905270510   and here is 
the same for a push event, in that case  an issue is created 
https://github.com/unodb-dev/unodb/issues/832 The link in the issue is broken, 
it tried to link to 
https://nyrkio.com/public/https%3A%2F%2Fgithub.com%2Funodb-dev%2Funodb/master/UnoDB_Benchmarks__x64_?commit=d2ab269909c64723eb5930201bde3bd8b7cefad3&timestamp=1766116904#full_n4_sequential_insert%3Cunodb::benchmark::olc_db%3E/32768
   
   So with this kind of graphing, ability to rerun the analytics for example 
after fine tuning parameters, and integration with a ticket system, triaging 
results should be much more pleasant. You should get more than 50% correct 
alerts, and you should only get them once, and you should be able to mark a 
change as closed either because it was fixed or considered invalid. (Nyrkiö 
doesn't do this last bit.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to