Re: [dev-servo] Servo testing as part of PhD dissertation

2016-09-07 Thread Robert O'Callahan
On Thu, Sep 8, 2016 at 4:12 AM, Lars Bergstrom  wrote:

> Along those lines, it's also worth looking at the very recent awesome
> work at the University of Washington formalizing layout (upcoming
> paper at OOPSLA):
> http://cassius.uwplse.org/
>

> I've been in contact with them with the hopes of trying it out in the
> context of Servo, as I believe there are both some interesting testing
> applications and some really nifty things that we could do with
> devtools using such a tool, too.
>

Cassius doesn't support any kind of fragmentation, not even line breaking,
and they look difficult to add to the Cassius model. But it does look cool
for the sort of testcases gsnedders was talking about.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Servo testing as part of PhD dissertation

2016-09-07 Thread Geoffrey Sneddon
On 07/09/16 17:13, Boris Zbarsky wrote:
> On 9/7/16 11:55 AM, Geoffrey Sneddon wrote:
>> There are plenty of rendering bugs that don't involve text, and
>> practically if you're generating arbitrary web pages it's easy to solve
>> all of those problems by simply not including text (though you'll need
>> to give boxes explicit heights!).
> 
> Does using the Ahem font still leave noticeable font rendering
> differences between browsers?

Yes, there are anti-aliasing differences. (See the infamous WebKit
Ahem-only AA-disable hack for Acid3!)

/gsnedders

___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Servo testing as part of PhD dissertation

2016-09-07 Thread Boris Zbarsky

On 9/7/16 11:55 AM, Geoffrey Sneddon wrote:

There are plenty of rendering bugs that don't involve text, and
practically if you're generating arbitrary web pages it's easy to solve
all of those problems by simply not including text (though you'll need
to give boxes explicit heights!).


Does using the Ahem font still leave noticeable font rendering 
differences between browsers?


-Boris

___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Servo testing as part of PhD dissertation

2016-09-07 Thread Lars Bergstrom
On Wed, Sep 7, 2016 at 10:55 AM, Geoffrey Sneddon  wrote:
>>
>> However I think you might make progress with some sort of
>> consensus-based approach e.g. take a testcase and render it in
>> gecko/blink/webkit/edge. If the difference by some metric (e.g. number
>> of differing pixels, although more sophisticated approaches are
>> possible) is within some threshold then check whether servo is within
>> the same threshold. If it is consider that a pass otherwise a fail.
>
> FWIW, I was talking with a bunch of people in the Chrome team about such
> an oracle not that long ago. I think one can almost certainly come up
> with a useful oracle even though it'll have very real limitations.
>
> There are plenty of rendering bugs that don't involve text, and
> practically if you're generating arbitrary web pages it's easy to solve
> all of those problems by simply not including text (though you'll need
> to give boxes explicit heights!). Even if you allow text, you can
> probably get a long way by simply getting rid of all text and setting
> explicit width/height properties on everything such that the layout of
> the boxes doesn't change even if they're then empty. You can then
> compare the position of the box across browsers.

Along those lines, it's also worth looking at the very recent awesome
work at the University of Washington formalizing layout (upcoming
paper at OOPSLA):
http://cassius.uwplse.org/

I've been in contact with them with the hopes of trying it out in the
context of Servo, as I believe there are both some interesting testing
applications and some really nifty things that we could do with
devtools using such a tool, too.
- Lars
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Servo testing as part of PhD dissertation

2016-09-07 Thread Geoffrey Sneddon
On 06/09/16 21:58, James Graham wrote:
> [not sure if this will make it through to the list]
> 
> On 06/09/16 21:35, Jack Moffitt wrote:
>>> I haven't quite settled on my dissertation topic, but my top
>>> contender at
>>> the moment involves property-based (i.e. QuickCheck style) generation of
>>> random web pages/stylesheets.
>>
>> A sort of subtask of this which would be extremely useful is taking a
>> known rendering problem and producing a minimal reproduction of it.
>> For example, many issues are discovered in existing pages with perhaps
>> hundreds of kilobytes of extraneous data. It would be nice to reduce
>> the failing example to a minimal size. One issue is how to make an
>> oracle here. It would probably be an improvement to have it be only
>> semi-automated, where it does some shrinking then asks and repeats.
> 
> There is some prior art here e.g. [1]. I wrote a similar tool that was
> specialised to reducing js code whilst at Opera, but that doesn't ever
> seem to have been released. In both cases you either had to write a
> one-off function to determine if the testcase was a pass or fail, or
> have a human judge it. Obviously the latter is impractically slow if
> your input is large.

To be pedantic: I wrote that tool. For some idea of scale, I wrote the
initial version in one slightly long working day (I'd guess around ~9
hours). None of the parsing/serialising code was written by me, all
handled by third party libraries.

Issues in JS and DOM are far more amenable to this than anything around
rendering. Most of the time the tool was used to compare behaviour with
JIT disabled/enabled, as most bugs being chased down were JIT bugs,
normally with some constraint along the lines of "throws x exception
with JIT enabled, and doesn't throw that with JIT disabled". Doing any
comparison with a different JS engine was already more complex, because
you could end up in a case where one threw a SyntaxError on the code,
but one didn't: then you just end up reducing to some small fragment
that throws that exception when that's probably expected behaviour.

>>> The oracle would be a cluster of browsers
>>> (multiple vendors/variants) driven by WebDriver/Selenium that would
>>> render
>>> the test cases and screenshot them. Significant discrepancies between
>>> renderings would be considered a failing test case and then standard
>>> QuickCheck-style shrinking would be used to reduce the test case
>>> HTML/CSS
>>> to a minimal-ish reproducer.
>>
>> Each browser renders things slightly differently, so pixel by pixel
>> comparison across browsers is probably not going to work well. For our
>> own testing of this kind we instead produce the same result using two
>> different techniques, or in a few cases we make reference images.
>> However making reference images can't account for all rendering
>> differences (like text) and so we avoid it if possible. I imagine it
>> would be quite difficult if the reference image was from another
>> engine, not our own.
> 
> Yes, I imagine specifically font rendering will be a problem, along with
> antialiasing in general and tegitimate-per-CSS variations in properties
> such as outline.
> 
> However I think you might make progress with some sort of
> consensus-based approach e.g. take a testcase and render it in
> gecko/blink/webkit/edge. If the difference by some metric (e.g. number
> of differing pixels, although more sophisticated approaches are
> possible) is within some threshold then check whether servo is within
> the same threshold. If it is consider that a pass otherwise a fail.

FWIW, I was talking with a bunch of people in the Chrome team about such
an oracle not that long ago. I think one can almost certainly come up
with a useful oracle even though it'll have very real limitations.

There are plenty of rendering bugs that don't involve text, and
practically if you're generating arbitrary web pages it's easy to solve
all of those problems by simply not including text (though you'll need
to give boxes explicit heights!). Even if you allow text, you can
probably get a long way by simply getting rid of all text and setting
explicit width/height properties on everything such that the layout of
the boxes doesn't change even if they're then empty. You can then
compare the position of the box across browsers.

>>> Is this idea of interest to the Servo team? Would it be useful for Servo
>>> development/testing? Or perhaps redundant with existing testing I'm not
>>> aware of?
>>
>> The main kind of testing we do is reference testing where the
>> reference is the same content achieved by different means. This is
>> pretty robust to things like font rendering changing slightly between
>> versions. We have some JS level testing where JS APIs are invoked and
>> then results verified, but it sounds like you are more focused on the
>> visual testing aspect. As an aside, I think quickchecking JS  APIs is
>> likely to find a ton of bugs and be useful too, plus it probably
>>