Re: New automated test coverage: openQA tests of critical path updates

Kamil Paral Thu, 02 Mar 2017 01:32:11 -0800

> > There's one important thing we need to do first, though. Bodhi ID
> > doesn't identify the thing tested uniquely, because Bodhi updates are
> > mutable (and the ID is kept). So Bodhi (or any gating tools) can't
> > rely on just retrieving the latest result for a particular Bodhi ID
> > and trust that result. It might be old and no longer reflect the
> > current state. We need to extend bodhi_update results with
> > "timestamp" key in extra data, that will report the "last_modified"
> > time of the Bodhi update tested. And Bodhi (or any other tool) must
> > not only query for item=$bodhi_id&type=bodhi_update, but also for
> > &timestamp=$timestamp. Only with this we can be sure we've really
> > tested particular Bodhi update.
> 
> I'm not so sure it's really necessary, and doing it is actually tricky
> for openQA. Only the openQA job itself knows what packages it actually
> tested, and it doesn't have an easy way to get the associated
> timestamp. The scheduler could easily get the timestamp at the time the
> job was created, or at the time the job completed, but that will never
> be 100% reliable, because the job actually goes and does the download
> somewhere in between those two times.


This problem is not exclusive to openqa, it affects all tasks that test bodhi 
updates and download the included rpms (there's always a race condition 
window). For openqa, I see two options here:

a) record the timestamp in the scheduler when the job is created and use it. 
Either it will be correct, or if the race condition happens, it will publish a 
result based on testing newer packages with an older timestamp. That's slightly 
incorrect, but not really a problem. Because the update edit event scheduled 
another openqa run, and that will publish an up-to-date result. So there's no 
harm done.

b) record the timestamp in the scheduler when the job is created, and when the 
job is finished. If they don't match, ignore the result, don't publish it. The 
update edit event scheduled another openqa run anyway. Again, no harm done and 
we didn't populate resultsdb with an incorrect result. (This is similar to what 
we do in certain taskotron tasks - if we detect that a bodhi update state 
doesn't match at the time when we publish results, we print it into the logs 
and skip them.)

> 
> The job can - and already does - log the exact packages it actually
> got, but I don't think there's an easy way for it to take the
> 'last_modified' date for the update at the time it does the download.

I don't know how you download the rpms, but a single python call can do that 
(http get and parse the json). Again, to prevent race conditions, it would be 
good to do the call before and after downloading the rpms and compare the 
timestamp. These race conditions occur surprisingly often once you start 
executing hundreds/thousands tasks a day.

But if this is easier done in the scheduler, I think that's totally fine.

> 
> OTOH, I don't think it's really too bad just to show the 'most recent'
> results. That should usually only be out of date for a few minutes
> after an update is edited. It might be possible to do a 'tests
> running...' spinner when there are jobs scheduled or running for the
> update in question, even.

You're assuming here that the new task will finish successfully. It will often 
not. From my experience, network is the bane of automated testing. Bodhi will 
time out, koji will time out, they will return http 5xx errors, etc. Taskotron 
tasks are plagued with it (at least dozens such failures a day). That's why I 
try to detect the race condition and either not record it at all, or record it 
with the older timestamp, which is safe - you don't mislead people/tools when 
looking at the results. The worst thing to happen here is that a result is 
missing for a long time. And people will then complain (and we start 
investigating) or they'll use the "request re-testing" button, which we'll have 
to provide sooner or later (because all systems are imperfect).

Of course I'm not saying we need to have this *now*. But I think it's necessary 
for gating updates.
_______________________________________________
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org

Re: New automated test coverage: openQA tests of critical path updates

Reply via email to