On Thu, 2016-09-01 at 11:31 -0700, Adam Williamson wrote:
> On Fri, 2016-08-26 at 17:33 -0700, Adam Williamson wrote:
> 
> > Hey folks! Just a heads-up to the openQA-interested: I'm working on
> > another update to current upstream git. staging is now running the
> > latest git of both os-autoinst and openQA. There have been some changes
> > upstream which are related to working with Mojolicious 7, but aaannz
> > assures me they're Mojo 6-compatible, and they still have one
> > deployment running on Mojo 6. So far, it seems to be working OK.
> > 
> > We're slightly suspicious about
> > https://github.com/os-autoinst/openQA/commit/f2547e9bcc0a166f993426bceeacd00179116716
> >  ;
> > apparently they're still arguing about whether it's the right thing to
> > do. I'll keep an eye on it, and if any uploads go squiffy, I'll revert
> > it in the package. So far, though, at least one upload test has run and
> > worked.
> 
> So in the end I did run into several issues with upload tests and
> worker shutdowns. I've made several fixes to the packages since and am
> now pretty much happy with how staging is behaving, but upstream has
> decided not to take my fix for one of the problems, instead planning on
> a different approach:
> 
> https://github.com/os-autoinst/openQA/pull/844
> 
> so I'm gonna wait a bit and see if their fix for that shows up soon,
> and if it works, before doing official builds and updating the
> production instance.
> 
> I do recommend using staging if you're going to do any needle editing,
> because it has https://progress.opensuse.org/issues/13456 fixed, which
> makes the needle editing experience much nicer.

Hi, this is AdamW, here is your openQA Nightly News...

First up, I twiddled a bit more with the openQA and os-autoinst
packages. Right now staging is running git snapshots from Sept 2nd with
my patch (still not the upstream one, as coolo says it doesn't handle
cancellation very well yet) for the 'incompletion during upload'
problem. This seems to be working pretty much fine; occasionally a
worker process dies when I do a mass job cancel (like when I force a
re-run of a compose that's currently being tested, which causes all the
running jobs to be obsoleted), but it's not very often, and we rarely
(almost never) do that on prod, so it shouldn't be a big problem. So
I'm planning to probably send that to production next week.

I spent most of this week working basically playing Stop openQA Doing
Wacky Stuff; I've got a couple of big patches in for review, one which
does a lot of stuff to try and guard against problems while typing, and
another which deletes a bunch of old needles (which should ease up on
the worker box I/O load some). I also investigated a few tricky long-
standing bugs and got some fixes done to NetworkManager and anaconda;
particularly, from now on, we shouldn't get those annoying 'network
failed to come up' failures. I fixed a couple of long-term broken
tests, too, especially the realmd_join_cockpit one (which also tests
freeipa's web UI).

Finally, I downgraded qemu on staging back to 2.6.0-5 (from 2.6.1)
today, and since then we've had exactly *no* completely weird fails.
Ever since I updated the worker boxes about a week ago, we've been
seeing tests fail in odd ways, like missing characters when typing at
the console (which I don't think I'd *ever* seen before) or just timing
out at odd places. I kinda suspected qemu all along but there was so
much other stuff I was dealing with I didn't want to throw a qemu
downgrade into the mix. Now all the other fixes are deployed on staging
it seemed quiet enough to try the qemu downgrade on top, and it does
seem to be helping so far, at least after one and a half composes - the
Rawhide and 25 nightlies for today. So I've now downgraded qemu on the
prod workers too, and I'll keep an eye on things over the weekend and
see how they pan out. If it really does seem like old qemu makes things
better, I'll file a qemu bug next week and talk to the virt devs about
it (they'll *love* that bug report, I'm sure).

So the current state of play is that prod still has fairly old openQA
and os-autoinst and is running current 'develop' of the tests; I've
just now downgraded qemu on it (so 20160910.n.0 onwards will run with
the old qemu). It doesn't have the 'key-fixes' commit so it is still
subject to typing errors, particularly on the anaconda root password /
user creation spokes.

staging has had old qemu since 20160909.n.0 and is running recent git
snapshots of openQA and os-autoinst, and is using the 'key-fixes'
branch of the tests with the needle cleanup commit cherry-picked on
top. I'm hopeful that with this config we will actually get reliably
comparable run-to-run results...

This has been your openQA nightly news!
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
_______________________________________________
qa-devel mailing list
qa-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/qa-devel@lists.fedoraproject.org

Reply via email to