On Thu, 2016-09-01 at 11:31 -0700, Adam Williamson wrote: > On Fri, 2016-08-26 at 17:33 -0700, Adam Williamson wrote: > > > Hey folks! Just a heads-up to the openQA-interested: I'm working on > > another update to current upstream git. staging is now running the > > latest git of both os-autoinst and openQA. There have been some changes > > upstream which are related to working with Mojolicious 7, but aaannz > > assures me they're Mojo 6-compatible, and they still have one > > deployment running on Mojo 6. So far, it seems to be working OK. > > > > We're slightly suspicious about > > https://github.com/os-autoinst/openQA/commit/f2547e9bcc0a166f993426bceeacd00179116716 > > ; > > apparently they're still arguing about whether it's the right thing to > > do. I'll keep an eye on it, and if any uploads go squiffy, I'll revert > > it in the package. So far, though, at least one upload test has run and > > worked. > > So in the end I did run into several issues with upload tests and > worker shutdowns. I've made several fixes to the packages since and am > now pretty much happy with how staging is behaving, but upstream has > decided not to take my fix for one of the problems, instead planning on > a different approach: > > https://github.com/os-autoinst/openQA/pull/844 > > so I'm gonna wait a bit and see if their fix for that shows up soon, > and if it works, before doing official builds and updating the > production instance. > > I do recommend using staging if you're going to do any needle editing, > because it has https://progress.opensuse.org/issues/13456 fixed, which > makes the needle editing experience much nicer.
Hi, this is AdamW, here is your openQA Nightly News... First up, I twiddled a bit more with the openQA and os-autoinst packages. Right now staging is running git snapshots from Sept 2nd with my patch (still not the upstream one, as coolo says it doesn't handle cancellation very well yet) for the 'incompletion during upload' problem. This seems to be working pretty much fine; occasionally a worker process dies when I do a mass job cancel (like when I force a re-run of a compose that's currently being tested, which causes all the running jobs to be obsoleted), but it's not very often, and we rarely (almost never) do that on prod, so it shouldn't be a big problem. So I'm planning to probably send that to production next week. I spent most of this week working basically playing Stop openQA Doing Wacky Stuff; I've got a couple of big patches in for review, one which does a lot of stuff to try and guard against problems while typing, and another which deletes a bunch of old needles (which should ease up on the worker box I/O load some). I also investigated a few tricky long- standing bugs and got some fixes done to NetworkManager and anaconda; particularly, from now on, we shouldn't get those annoying 'network failed to come up' failures. I fixed a couple of long-term broken tests, too, especially the realmd_join_cockpit one (which also tests freeipa's web UI). Finally, I downgraded qemu on staging back to 2.6.0-5 (from 2.6.1) today, and since then we've had exactly *no* completely weird fails. Ever since I updated the worker boxes about a week ago, we've been seeing tests fail in odd ways, like missing characters when typing at the console (which I don't think I'd *ever* seen before) or just timing out at odd places. I kinda suspected qemu all along but there was so much other stuff I was dealing with I didn't want to throw a qemu downgrade into the mix. Now all the other fixes are deployed on staging it seemed quiet enough to try the qemu downgrade on top, and it does seem to be helping so far, at least after one and a half composes - the Rawhide and 25 nightlies for today. So I've now downgraded qemu on the prod workers too, and I'll keep an eye on things over the weekend and see how they pan out. If it really does seem like old qemu makes things better, I'll file a qemu bug next week and talk to the virt devs about it (they'll *love* that bug report, I'm sure). So the current state of play is that prod still has fairly old openQA and os-autoinst and is running current 'develop' of the tests; I've just now downgraded qemu on it (so 20160910.n.0 onwards will run with the old qemu). It doesn't have the 'key-fixes' commit so it is still subject to typing errors, particularly on the anaconda root password / user creation spokes. staging has had old qemu since 20160909.n.0 and is running recent git snapshots of openQA and os-autoinst, and is using the 'key-fixes' branch of the tests with the needle cleanup commit cherry-picked on top. I'm hopeful that with this config we will actually get reliably comparable run-to-run results... This has been your openQA nightly news! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ qa-devel mailing list qa-devel@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/qa-devel@lists.fedoraproject.org