Matthew Treinish <[email protected]> wrote:
On Mon, Aug 08, 2016 at 02:40:31PM +0200, Ihar Hrachyshka wrote:Hi,I was looking at grafana today, and spotted another weirdness. See the periodic jobs dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=4&fullscreen Currently it shows for me 100% failure rate for py34/oslo-master job, starting from ~Aug 3. But when I go to openstack-health, I don’t see those runs at all: http://status.openstack.org/openstack-health/#/job/periodic-neutron-py34-with-neutron-lib-master (^ The last run is July 31.) But then when I drill down into files, I can see more recent runs, like: http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/?C=M;O=A http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/testr_results.html.gz The last link points to a run from yesterday. And as you can see it is passing.That run isn't actually from yesterday, it's from July 30th. The directory shows a recent date, but the last modified dates for the individual files is older:http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/The openstack-health data goes up until the job started failing, this is likely because the failures occur early enough in the test run that no subunit outputis generated for the run.So, what’s wrong with the grafana dashboard? And why doesn’t openstack-health show the latest runs?On the openstack-health side it looks like you're running into an issue with using subunit2sql as the primary data source there. If you look at an exampleoutput from what's not in openstack-health, like: http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/37cd5eb/console.html.gz
Nice! I guess you just picked one of those that is not present on Health dashboard? Or you did something more elaborate to come up with the link?
You'll see that the failure is occuring before any subunit output is generated. (during the discovery phase of testr) If there is no subunit file in the log output for the run, then there is nothing to populate the subunit2sql DB with.The grafana/graphite data doesn't share this limitation because it gets populated directly by zuul.This is a known limitation with openstack-health right, and the plan to solve it is to add a zuul sql data store that we can query like subunit2sql for job level information, and then use subunit2sql for more fine grained details. The work on that currently depends on: https://review.openstack.org/#/c/223333/ which adds the datastore to zuul. Once that lands we can work on the openstack-health sideconsume that data in conjunction with subunit2sql. -Matt Treinish
Just want to say a huge thank you for the reply. It both pointed me to the immediate problem to solve as well as gave wider perspective on the mechanics that I should be aware of. It’s great to work in a community of individuals that so often go an extra mile for their fellow.
Ihar
signature.asc
Description: Message signed with OpenPGP using GPGMail
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
