Re: [b2g] Automating smoketests

No-Jun Park Mon, 15 Jun 2015 11:31:22 -0700

Just a couple points about imagecompare: there are currently 19 tests which run 
on device after every nightly build (~ 2 times per day), and it takes about 2 
hours or so. They are either derivatives of the existing Gaia UI tests or 
combination of a few end-user use cases.  Currently they run on device only, 
(although they can run on b2g emulator) and its results are checked daily 
manually because half the time the test failures (image mismatching with the 
reference image) are caused by either legitimate UI code change, or existing 
bugs.  Verifying the result manually takes less than 5 minutes, so it’s not too 
much of the resource strain on my part (by not having it on treeherder, I can 
afford to be a bit lenient on its pass/fail status).  But since it inherently 
requires visual inspection to confirm the failure, imagecompare might not be 
the right tool to integrate to the try-tests.  Also since it relies on the 
captured screenshot, it cannot detect visual bugs during the screen transition 
as well.  Once the v3 feature of recording screen output in a video format is 
complete, we could do something about it though.


On the other hand, the desktop b2g automated tests have certain limitations by 
its nature.  For example, we can’t really test bluetooth/wifi or camera app on 
it. I think heard there is a plan to run on-device tests for try-tests (and I 
think that was one of the reasons behind enabling marionette-js on device).  If 
this is implemented, we would definitely find these issues early on, which 
would be great.  Nevertheless, as Kat says, I think there are still places to 
improve when it comes to the gaia integration tests.  B2G automation team is 
currently working into converting existing Gip tests to Gij (as Johan’s email 
says), and while doing so I’m also looking into writing more Gij tests that 
covers common use cases, but since we are also tackling other day-to-day 
operation tasks as well, the progress on it is slower than we like.

One other might-be-related thing that comes into my mind is that if we’re 
moving away from pvtbuilds to taskcluster, we might need to think about how 
that is going to affect the bisection task. If it’s basically the same task 
with different build source, that’s great, but if there is some subtle 
differences (like how long each builds are stored) it might be nice to plan 
ahead about it.

No-Jun



> On Jun 15, 2015, at 6:20 AM, Johan Lorenzo <jlore...@mozilla.com> wrote:
> 
> +qa-b2g
> 
> TL;DR: Graphics smoketests are not well covered because of our tools. Also 
> automate end-to-end tests are costly to run and flaky. We probably need to 
> focus more on unit tests, and to know where we need to add these tests, we 
> need coverage reports.
> 
> 
> From the QA standpoint, we used to automate as many smoketests as possible in 
> Python. However, since marionette-js became ready for running on device, our 
> main effort has been to start to migrate what we have in Gip to Gij.
> 
> In other words, our team is currently focused on:
> Make sure marionette-js runs on device the way we need,
> Migrate the tests that can be run on Treeherder from Gip to Gij,
> Maintain the Gip suite, while the tests are migrated,
> Triage the flaky tests,
> Perform manual testing on lately found regression.
> Another issue I'd like to call out regarding the smoke testing of graphics: 
> Most of our automation relies on Marionette. As it essentially relies on the 
> DOM, it usually doesn't catch regressions. That's how we end up seeing them 
> in manual testing. To prevent that, No-Jun created the imagecompare suite, 
> that runs on a daily-basis. I don't know how many tests we have in there.
> 
> On another point, Greg is right about one thing: Most of our end to end tests 
> (that is to say, on device) are unreliable. Like said above, we're on 
> triaging these tests; we're are on the right track to get at least 2 suites: 
> One for the tests known to be 100% reliable, and one for the flaky ones.
> 
> In parallel, since an end to end test has a high cost to be run (currently 
> 1-2 minutes per test on a Flame) and since it's hard to craft a totally 
> reliable test, the best way to prevent regressions remains to write unit 
> tests [1]. It's currently hard to know what areas are unit tested. One way to 
> start could be to automatically generate reports about the coverage on each 
> push. I know Sylvestre Ledru started a discussion like this on the Firefox 
> release-drivers mailing list. I do think we need a tool like this for gecko 
> and gaia. At a first glance, and correct me if I'm wrong, having coverage 
> analysis should be achievable as we use mocha to run the tests. We could use 
> a tool like istanbul[2] and plug it to coveralls[3]. From here, devs and QA 
> can have an overview of what areas need to be tested (automatically or 
> manually).
> 
> What do you all think?
> 
> Johan
> 
> [1] 
> http://googletesting.blogspot.fr/2015/04/just-say-no-to-more-end-to-end-tests.html
>  
> <http://googletesting.blogspot.fr/2015/04/just-say-no-to-more-end-to-end-tests.html>
> [2] https://github.com/gotwarlost/istanbul 
> <https://github.com/gotwarlost/istanbul>
> [3] http://coveralls.io/ <http://coveralls.io/>
> 
> 
> On Mon, Jun 15, 2015 at 4:17 AM, Greg Weng <snowma...@gmail.com 
> <mailto:snowma...@gmail.com>> wrote:
> I think the problem is if our infrastructure or test method is not powerful 
> enough to test those tests in automation. For example, one smoketest I've 
> broken is about LockScreen doesn't update time correctly, if user pulled the 
> battery before rebooting (Bug 1119097). In such case, I think the key is our 
> automation couldn't cover some hardware related cases like battery or RIL, 
> although I've heard some works, like to allow simulators to dial to each 
> other, are actually ongoing. However, I believe these are tough works for our 
> Gecko and Gonk team, since the latest news I now have is they are planning to 
> make a new abstraction upon HAL and to make it more decoupling from the real 
> devices. And some teams now even has no plan for that yet. Maybe we, I mean 
> MoCo/MoFo, should consider this is one of the most priority issues. Since to 
> be tricked by inaccurate CI result (compare to real devices) and wait the 
> patch to be stable enough always torture us (Gaia team). And the idea (to 
> make a new abstraction for testing and porting purposes) had already been 
> mentioned at least one year ago, as far as I know.
> 
> By the way, for regressions I'm used to bisecting to find the accurate broken 
> patch. However from the information I've got is for Gecko or whole B2G it's 
> impractical to do bisecting, if considering the building time. I wonder if we 
> could or need to ease the pain of this to make finding a actual broken part 
> more easily and automatically. I believe this does help.
> 
> 2015年6月15日 上午3:55於 "Kartikaya Gupta" <kgu...@mozilla.com 
> <mailto:kgu...@mozilla.com>>寫道：
> Is there any effort under way to making the smoketests automated and
> run as part of our regular automation testing? The B2G QA team does a
> great job of identifying regressions and tracking down the regressing
> changeset, but AFAIK this is an entirely manual process that happens
> after the change has landed. Ideally we should catch this on try
> pushes or on landing though, and for that we need to automate the
> smoketests.
> 
> There have been a lot of complaints (and rightly so) about all sorts
> of B2G-breaking changesets landing. I myself have landed quite a few
> of them. I think it's unrealistic to expect every Gecko developer to
> run through all of the smoketests manually for every change they want
> to make (even just for the main devices/configurations we support).
> It's also unrealistic to expect them to reliably identify "high risk"
> changes for explicit pre-landing QA testing, because even small
> changes can break things badly on B2G given the variety of
> configurations we have there.
> 
> I think the only reasonable long-term solution is to automate the
> smoketests, and I would like to know if there's any planned or
> in-progress effort to do that.
> 
> Cheers,
> kats
> _______________________________________________
> dev-gaia mailing list
> dev-g...@lists.mozilla.org <mailto:dev-g...@lists.mozilla.org>
> https://lists.mozilla.org/listinfo/dev-gaia 
> <https://lists.mozilla.org/listinfo/dev-gaia>
> 
> _______________________________________________
> dev-gaia mailing list
> dev-g...@lists.mozilla.org <mailto:dev-g...@lists.mozilla.org>
> https://lists.mozilla.org/listinfo/dev-gaia 
> <https://lists.mozilla.org/listinfo/dev-gaia>
> 
> 
> _______________________________________________
> dev-gaia mailing list
> dev-g...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-gaia

_______________________________________________
dev-b2g mailing list
dev-b2g@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-b2g

Re: [b2g] Automating smoketests

Reply via email to