Re: Split Gecko in standalone fuzzing-friendly programs.

2016-03-09 Thread twsmith
On Wednesday, March 9, 2016 at 12:47:08 PM UTC-8, decod...@googlemail.com wrote:
> > 
> > > the sample tests (xpcshell-tests) are extremely complicated to adapt
> > 
> > That seems like it would be a problem in any new thing too, right?
> 
> Actually no. I adapted our gtests in less than an hour.
> 
> > 
> > > and we can't easily use it with AFL.
> > 
> > Just to satisfy my curiosity, what is AFL?
> 
> http://lcamtuf.coredump.cx/afl/
> 
> > 
> > > but that still doesn't solve the problem that people have to write the 
> > > necessary code that we can fuzz then.
> > 
> > OK.  This is a problem, certainly, and pretty independent of both the 
> > "split Gecko" thing and the existence of shells, right?
> 
> Not really no. Because some shells and tests we have are very straightforward 
> to use and we can figure it out ourselves. xpcshell is not such an example.
> 
> > 
> > What are the necessary qualities for things you can fuzz?
> 
> 
> It depends on the type of fuzzing. Let's stick to AFL:
> 
> - Program is easy to start (doesn't need profiles or long initialization) and 
> can be packaged
> - Has AFL persistent mode support (requires support on C++ level)
> - Exercises the targeted feature in a similar way compared to how Firefox 
> would do it
> - Optionally has some extra testing features (e.g. gczeal, ion-eager, 
> extra-checks for the JS shell) that make bug finding easier
> - Can be compiled with all sanitizer types (although MSan is not going to 
> work for some stuff even in shells)
> 
> 
> That's just a dump out of my head, might be missing some stuff.

More qualities that benefit fuzzing:
- Fast
- Deterministic (very useful for feedback driven fuzzing)
- Automate-able, easily run from a script from the command line
- Distribute-able in parallel and size by side. So ideally statically linked 
and multiple instances can run at the same time in the same environment.
- Lots of assertions
- Targeted, only meant to test one area of code (very useful for feedback 
driven fuzzing)
- Cross platform, unless functionality is the exactly the same and then in that 
case target Linux
- Accepts data from a file or stdin, having to depend on web servers etc, does 
complicate things but sometimes is required
- Simple build system, now I feel like I'm just stating the obvious :)

 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Split Gecko in standalone fuzzing-friendly programs.

2016-03-09 Thread twsmith
This seems to be the general response whenever this topic is brought up, with 
good reason. I don't think modularizing Gecko just for the sake of it makes 
much sense. What I think the take away from this should be is: when developing 
a new component or maintaining an old one keep in mind that making it 
accessible to fuzzers is valuable. In the short term this may been making 
gtests that we can leverage.

Decoder has started work on support for gtests to help use it as an alternative 
to the full browser. I believe he does have a couple simple fuzzers working atm 
however I'm not sure what they have been finding. I have tried myself with 
varying levels of success.

I think a good starting point for this work would be media and image processing 
(codecs, parsers demuxers, decoders, etc...). We (fuzzing team) have already 
talked to the media team about this. Christoph and I recently fuzzed the new 
BMP decoder and had many situations where we hit issues that were not 
reproducible that would have been easily reproducible in a shell. The issues 
were in the image processing code but obscured by the rest of the system. The 
main issue in these cases was threading/timing.

Maybe others have ideas for how we can make code more fuzz-able?


On Wednesday, March 9, 2016 at 10:15:16 AM UTC-8, Bobby Holley wrote:
> Can you elaborate on which Gecko components you're hoping to fuzz
> separately? A lot of the core is pretty heavily-intertwined, so I'm pretty
> skeptical that we'd ever be able to separate out DOM, style, and layout
> from each other (for example). There are basically two barriers:
> (1) These components are enormous, and were not built to be very modular.
> Any such efforts would require a huge amount of engineering resources,
> which we would probably not spend just to make the component fuzzable.
> (2) The performance cost of adding an abstraction layer between
> tightly-coupled components in C++11 would probably be prohibitive (the
> situation is different for Rust/Servo because of Traits - we could
> potentially do this with C++ Concepts/Modules in around a decade).
> 
> This is all to say that I think a general call to "modularize Gecko" isn't
> really helpful. But if there are specific leaf-y components that you want
> to fuzz separately but can't, that might be a good starting point.
> 
> On Wed, Mar 9, 2016 at 9:54 AM,  wrote:
> 
> > On Wednesday, March 9, 2016 at 9:38:55 AM UTC-8, Nicolas B. Pierron wrote:
> > > This discussion is a follow-up discussion to some emails sent privately
> > by
> > > accident.
> > >
> > > If you have not followed, I will quote David Bryant:
> > >  > Improving release quality is one of the three fundamental goals
> > Platform
> > >  > Engineering committed to this year. To this end, lmandel built a
> > Bugzilla
> > >  > dashboard that allows us to track regressions found in any given
> > release
> > >  > cycle. This dashboard [...] can
> > >  > also be found at: http://mozilla.github.io/releasehealth/
> > >
> > > To David's email, I answered the following:
> > >
> > > --
> > > tl;dr: If we want to improve the quality of our products we should
> > > split Gecko in standalone programs which are fuzzing-friendly.
> > >
> > > One thing which strikes me, is the ratio of regressions per component
> > > that we have for each versions, and more over who are the persons
> > > opening these bugs:
> > >   - Release:
> > >
> > https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox44=OP=cf_status_firefox43=cf_status_firefox43=cf_status_firefox43=CP_fields=id=OR=regression%2C_type=allwords_id=12898533=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
> > >   - Beta:
> > >
> > https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox45=OP=cf_status_firefox44=cf_status_firefox44=cf_status_firefox44=CP_fields=id=OR=regression%2C_type=allwords_id=12898534=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
> > >   - Aurora:
> > >
> > https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox46=OP=cf_status_firefox45=cf_status_firefox45=cf_status_firefox45=CP_fields=id=OR=regression%2C_type=allwords_id=12898536=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
> > >   - Nightly:
> > >
> > https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox47=OP=cf_status_firefox46=cf_status_firefox46=cf_status_firefox46=CP_fields=id=OR=regression%2C_type=allwords_id=12898528=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
> > >
> > > To be more precise:
> > >   - The small number of regression we have in the JS engine on the
> > > release channel, versus the Extremely Huge number of regressions we
> > > have on nightly.
> > >   - And the fact that 

Re: Split Gecko in standalone fuzzing-friendly programs.

2016-03-09 Thread twsmith
On Wednesday, March 9, 2016 at 9:38:55 AM UTC-8, Nicolas B. Pierron wrote:
> This discussion is a follow-up discussion to some emails sent privately by 
> accident.
> 
> If you have not followed, I will quote David Bryant:
>  > Improving release quality is one of the three fundamental goals Platform
>  > Engineering committed to this year. To this end, lmandel built a Bugzilla
>  > dashboard that allows us to track regressions found in any given release
>  > cycle. This dashboard [...] can
>  > also be found at: http://mozilla.github.io/releasehealth/
> 
> To David's email, I answered the following:
> 
> --
> tl;dr: If we want to improve the quality of our products we should
> split Gecko in standalone programs which are fuzzing-friendly.
> 
> One thing which strikes me, is the ratio of regressions per component
> that we have for each versions, and more over who are the persons
> opening these bugs:
>   - Release: 
> https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox44=OP=cf_status_firefox43=cf_status_firefox43=cf_status_firefox43=CP_fields=id=OR=regression%2C_type=allwords_id=12898533=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
>   - Beta: 
> https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox45=OP=cf_status_firefox44=cf_status_firefox44=cf_status_firefox44=CP_fields=id=OR=regression%2C_type=allwords_id=12898534=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
>   - Aurora: 
> https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox46=OP=cf_status_firefox45=cf_status_firefox45=cf_status_firefox45=CP_fields=id=OR=regression%2C_type=allwords_id=12898536=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
>   - Nightly: 
> https://bugzilla.mozilla.org/buglist.cgi?columnlist=product%2Ccomponent%2Creporter=cf_status_firefox47=OP=cf_status_firefox46=cf_status_firefox46=cf_status_firefox46=CP_fields=id=OR=regression%2C_type=allwords_id=12898528=equals=equals=equals=equals_format=advanced=---=affected=unaffected=%3F=---_based_on=
> 
> To be more precise:
>   - The small number of regression we have in the JS engine on the
> release channel, versus the Extremely Huge number of regressions we
> have on nightly.
>   - And the fact that (almost) all the bugs opened against the JS
> engine are opened by our fuzzing team.
> 
> What I want to remark is the fact that our automated fuzzing is better
> at finding recently introduced regressions.  And as far as I know,
> Alice is not a bot.
> 
>  From what I know, the reason fuzzing team is so efficient on the JS
> engine is because we have a *standalone* JS shell.
> The *standalone* JS shell is also the reason why our build time is
> below 2 minutes as opposed to 18 minutes.
> 
> So, I think that if we want to improve our quality we should focus on
> making fuzzing-friendly standalone programs for the different
> components of the platform.
> Thus reducing, the compilation time, reducing the test suite time, and
> improving the ability of the fuzzing team to find recently added
> regressions.
> 
> Maybe I am wrong, in which case the other alternative might be to
> staff the JS Team to get rid of all these nightly issues before they
> ride the train to release.
> --
> 
> To which I got the following replies:
> 
> On Wed, Mar 9, 2016 at 3:05 PM, Kyle Huey wrote:
>  > The ratio of engineers to code in the js engine is so much higher than
>  > the rest of the product that I'm not sure this is a sensible comparison.
>  > The js engine also doesn't depend on things like 3rd party gfx drivers ...
> 
> On Wed, Mar 9, 2016 at 3:05 PM, Olli Pettay wrote:
>  > Fuzzing captures only a fraction of issues.
> 
> On Wed, Mar 9, 2016 at 3:42 PM, Chris Hofmann wrote:
>  > On Wed, Mar 9, 2016 at 7:05 AM, Kyle Huey wrote:
>  >>
>  >> The ratio of engineers to code in the js engine is so much higher than the
>  >> rest of the product that I'm not sure this is a sensible comparison.  The 
> js
>  >> engine also doesn't depend on things like 3rd party gfx drivers ...
>  >
>  > This is probably not the only step that we need to take to substantially
>  > improve quality so setting up a place to have those discussions is good.  
> It
>  > really is worth some time and effort to brainstorm about all the things we
>  > might do to raise the bar, poke some holes in those ideas, then decide on
>  > and push forward on a few more in the next few quarters.
> 
> On Wed, Mar 9, 2016 at 5:23 PM, Al Billings wrote:
>  > On 3/9/16 6:58 AM, Nicolas B. Pierron wrote:
>  >> So, I think that if we want to improve our quality we should focus on
>  >> making fuzzing-friendly standalone programs for the different
>  >> components of the platform.
>  >> Thus reducing, the compilation time, reducing the test suite time, and
>  >> improving the