Re: [dev-servo] Servo testing as part of PhD dissertation

Geoffrey Sneddon Tue, 13 Sep 2016 03:36:59 -0700

On Mon, Sep 12, 2016 at 7:14 AM, Joel Martin
<kanaka.se...@martintribe.org> wrote:
> Thanks for all the feedback. I've collected the responses and replied
> inline.
>
>> James Graham wrote:
>
>> Yes, I imagine specifically font rendering will be a problem, along
>> with antialiasing in general and tegitimate-per-CSS variations in
>> properties such as outline.
>
> Do you happen to know of a reference/list of elements/styles that
> cause expected differences in rendering? I expect a couple days of
> reading through ACID2/3 bug lists would give me a pretty good idea,
> but perhaps somebody has already done the work of collating this info.

I don't think there's any great list. I think font rendering,
antialiasing, and differences allowed by spec are pretty much the
three things you need to worry about. (And antialiasing tends to
affect fonts far worse than anything else.)

Differences allowed by spec… one thing you might want to do is look at
CSS 2.1 and find everywhere it uses "may" or "should" or "defined".
That said, some things are defined by later CSS modules.

>
>> However I think you might make progress with some sort of
>> consensus-based approach e.g. take a testcase and render it in
>> gecko/blink/webkit/edge. If the difference by some metric (e.g.
>> number of differing pixels, although more sophisticated approaches
>> are possible) is within some threshold then check whether servo is
>> within the same threshold. If it is consider that a pass otherwise
>> a fail.
>
> That lines up with my intuition as well. I would probably start with
> something really straightforward like: (sum of pixel differences)
> / (number of page elements) > (arbitrary threshold).  After seeing how
> well that works (or likely doesn't work) and getting some real data,
> then I would begin adding more sophistication to the heuristic.

I'd be tempted to just try comparing the x and y coordinates, along
with the width and height, of each element. (Of course, this assumes
the different UAs create the same tree—but HTML parsing is relatively
interoperable nowadays and hence not that interesting.) You can then
have a fuzziness for each element (I'd try just starting out with one
or two!), and work from that. May or may not work better, and depends
upon your exact approach what is workable.

>
>> Geoffrey Sneddon wrote:
>
>> FWIW, I was talking with a bunch of people in the Chrome team about
>> such an oracle not that long ago. I think one can almost certainly
>> come up with a useful oracle even though it'll have very real
>> limitations.
>
> I would love to hear more about your conversation with the Chrome
> team about that. Any chance it was in public hard-copy form (seems
> funny to think of IRC, mailing lists, email that way)?

Hmmm. I can't actually remember where it was any more; I think we've
covered everything mentioned there in this thread already so it's
unlikely to be very useful. Pretty sure it was within the Web Platform
Predictability team, which has links to both mailing list and Slack
channel from <https://www.chromium.org/blink/platform-predictability>.
Pretty sure they'd be interested in hearing about anything you're
planning on!

>> There are plenty of rendering bugs that don't involve text, and
>> practically if you're generating arbitrary web pages it's easy to
>> solve all of those problems by simply not including text (though
>> you'll need to give boxes explicit heights!). Even if you allow
>> text, you can probably get a long way by simply getting rid of all
>> text and setting explicit width/height properties on everything such
>> that the layout of the boxes doesn't change even if they're then
>> empty. You can then compare the position of the box across browsers.
>
> I was thinking of replacing text with a sequence of images of words of
> varying lengths (styled to be as much like text as possible). You
> know, "lorem ipsum imagos" :-). Although that was before Boris
> mentioned the Ahem font which I wasn't aware of previously.

Images are somewhat difficult too (though significantly more likely to
render interoperably!); if you want to use intrinsically sized objects
for this I'd suggest instead using variously sized transparent GIFs:
ultimately what you're trying to do is alter the size of the block by
altering its contents. Putting anything in the image doesn't seem like
it actually tests anything interesting.

>> In either case, I expect generating arbitrary cases to actually be
>> not overly interesting, as I expect it'll be sufficiently unlikely
>> to combine features in interesting ways. You may well want some
>> code-coverage based feedback into the generation of instances, along
>> the lines of afl.
>
> One of my inspirations for doing this is the success we've had at my
> company (ViaSat) using QuickCheck (Clojure test.check) to find lots
> surprising defects in code that already had a fairly comprehensive
> test suite and was consider mature. Guiding the generation of tests
> using something like AFL/afl-cov definitely seems like a fruitful
> avenue of exploration especially if an initial direct approach doesn't
> reveal much of interest.
>
> I would be interesting in hearing more about why you think this type
> of testing wouldn't uncover interesting cases especially if it's more
> than intuition (your intuition about this certainly may be better than
> mine).

The vast majority of layout bugs require combining features in pretty
specific ways (to use the Chrome bug I ran into yesterday as an
example: you need a display: flex element with a display: flex child
with a percentage height). Depending on how you generate instances,
you'll likely either run into something like that relatively quickly
or pretty much never, I'd expect.

/gsnedders
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] Servo testing as part of PhD dissertation

Reply via email to