Re: Fuzzed - a program to find DMDFE parser crash

2018-12-17 Thread Stefan Koch via Digitalmars-d-announce

On Monday, 17 December 2018 at 10:12:44 UTC, Stefan Koch wrote:
On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg 
wrote:

On 2018-12-15 16:37, Basile B. wrote:

This poisoning kills the interest of using a fuzzer. 99% of 
the crashes will be in hdrgen.


Does that matter as long as the bug is found?


Well it's hard to tell if it's begin.


meant to say benign.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-17 Thread Stefan Koch via Digitalmars-d-announce

On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg wrote:

On 2018-12-15 16:37, Basile B. wrote:

This poisoning kills the interest of using a fuzzer. 99% of 
the crashes will be in hdrgen.


Does that matter as long as the bug is found?


Well it's hard to tell if it's begin.
Generally in a compiler which is focused on speed, you accept 
crashing an really bogus input if it makes the parser run faster, 
because there is no chance of accepting corrupted code, which is 
what you need to be worried about.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-16 Thread Jacob Carlborg via Digitalmars-d-announce

On 2018-12-15 12:29, Basile B. wrote:

While the D front end is not yet used to make tools

I've used it to make a tool, DLP [1].

[1] http://github.com/jacob-carlborg/dlp

--
/Jacob Carlborg


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-16 Thread Jacob Carlborg via Digitalmars-d-announce

On 2018-12-15 16:37, Basile B. wrote:

This poisoning kills the interest of using a fuzzer. 99% of the crashes 
will be in hdrgen.


Does that matter as long as the bug is found?

--
/Jacob Carlborg


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Basile B. via Digitalmars-d-announce
On Saturday, 15 December 2018 at 21:09:12 UTC, Sebastiaan Koppe 
wrote:

On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
I think this is what Walter calls "AST poisoning" (never 
understood how it worked before today). And the whole parser 
is like this.


This poisoning kills the interest of using a fuzzer. 99% of 
the crashes will be in hdrgen.


As is common with fuzzing, you'll need to ensure the program 
crashes.


Yes this is done by piping dmd with the random code (i dont use 
dmd as a library for now). If the process returns something 
different of 0 (ok) and 1 (normal compiler error) than the random 
code is saved in a file:


...
ProcessPipes pp = pipeProcess([Options.dc, "-"]);
pp.stdin.writeln(src);
pp.stdin.close;
if (!pp.pid.wait.among(0, 1)) fileName.write(src);
...

Actually it would be less convenient to do that with the front 
end as a library, since SEGFAULTs are supposed to kill the 
program...







Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Basile B. via Digitalmars-d-announce

On Sunday, 16 December 2018 at 01:57:17 UTC, Walter Bright wrote:

On 12/15/2018 2:48 PM, Neia Neutuladh wrote:
The way to fix this is to replace the entire parser and get 
rid of the
idea of AST poisoning; at the first error, you give up on 
parsing the
entire file. From there, you can try recovering from specific 
errors with

proper testing.


DMD tries to continue parsing after a syntax error, but it does 
not attempt semantic analysis if there were any errors.


The problem i underlined is more that, like in the code that 
parses typeof, a non null node is returned even if some 
expectations are not verified when parsing.


I'm not sure of what is the right fix. fixing the ast pretty 
printer or the parser ?


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Basile B. via Digitalmars-d-announce
On Saturday, 15 December 2018 at 22:48:01 UTC, Neia Neutuladh 
wrote:
The way to fix this is to replace the entire parser and get rid 
of the idea of AST poisoning; at the first error, you give up 
on parsing the entire file. From there, you can try recovering 
from specific errors with proper testing.


You can still continue parsing after an error but right now many 
sub-parsers always return an AstNode instead of null. The parser 
on null sub parser result could go to the end of the scope or to 
the next statement, depending on what it expected, and continue 
from there. That being said this wouldn't always work, e.g when a 
semi colon or a curly brace misses.


Simple example:

struct Foo
{
int a, b
string c; // error because a type identifier part wasn't 
expected ...
} // ... we're in a aggr body so consume toks 
past the curly brace


struct Bar
{
}





Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Walter Bright via Digitalmars-d-announce

On 12/15/2018 2:48 PM, Neia Neutuladh wrote:

The way to fix this is to replace the entire parser and get rid of the
idea of AST poisoning; at the first error, you give up on parsing the
entire file. From there, you can try recovering from specific errors with
proper testing.


DMD tries to continue parsing after a syntax error, but it does not attempt 
semantic analysis if there were any errors.




Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Walter Bright via Digitalmars-d-announce

On 12/15/2018 3:29 AM, Basile B. wrote:

The time to write this announce, already 5 "crashers" found.


Great! Please post them to bugzilla.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Neia Neutuladh via Digitalmars-d-announce
On Sat, 15 Dec 2018 21:09:12 +, Sebastiaan Koppe wrote:
> On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
>> I think this is what Walter calls "AST poisoning" (never understood how
>> it worked before today). And the whole parser is like this.
>>
>> This poisoning kills the interest of using a fuzzer. 99% of the crashes
>> will be in hdrgen.
> 
> As is common with fuzzing, you'll need to ensure the program crashes.
> Sometimes that requires some tweaking.
> 
> Regardless, you still have the input to investigate.

I think the point is that DMD tries to recover from parsing failures in 
order to provide additional error messages. But those parsing failures 
leave the parser in an invalid state, and invalid states are fertile ground 
for crashes.

The way to fix this is to replace the entire parser and get rid of the 
idea of AST poisoning; at the first error, you give up on parsing the 
entire file. From there, you can try recovering from specific errors with 
proper testing.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Sebastiaan Koppe via Digitalmars-d-announce

On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
I think this is what Walter calls "AST poisoning" (never 
understood how it worked before today). And the whole parser is 
like this.


This poisoning kills the interest of using a fuzzer. 99% of the 
crashes will be in hdrgen.


As is common with fuzzing, you'll need to ensure the program 
crashes. Sometimes that requires some tweaking.


Regardless, you still have the input to investigate.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Basile B. via Digitalmars-d-announce
On Saturday, 15 December 2018 at 14:22:48 UTC, Johan Engelen 
wrote:

On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:

Fuzzed [1] is a simple fuzzer for the D programming language.


Are you familiar with libFuzzer and LDC's integration?
https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html


No, but i'm not that surprised to see that a fuzzer already 
exists.

I may have even seen this article but completely forgot it.

You can feed libFuzzer with a dictionary of keywords to speed 
up the initial fuzzing phase, where the keywords are the tokens 
strings that you use.
Besides finding crashes, it's also good to enable ASan to find 
memory-related bugs that by luck didn't crash the program.



The time to write this announce, already 5 "crashers" found.


Great :)


I have about 40 now



The other day I was reminded of OSS Fuzz and that it'd be nice 
if we would setup fuzzing for the frontend and phobos there...


-Johan


I started looking at a crasher:

   typeof function function in

which crashes in hdrgen. Actually i realize that i don't like the 
D parser. In many cases it checks for errors but continues 
parsing unconditionally.


In the example, "in" leads to an null contract that the pretty 
formatter dereferences at some point, but parsing should have 
stopped after "typeof" since there is no left paren. Now take a 
look at typeof sub parser



AST.TypeQualified parseTypeof()
{
AST.TypeQualified t;
const loc = token.loc;

nextToken();
check(TOK.leftParentheses); // <--  why continuing if the 
check fails?

if (token.value == TOK.return_)
{
nextToken();
t = new AST.TypeReturn(loc);
}
else
{
AST.Expression exp = parseExpression();
t = new AST.TypeTypeof(loc, exp);
}
check(TOK.rightParentheses);
return t;
}

I think this is what Walter calls "AST poisoning" (never 
understood how it worked before today). And the whole parser is 
like this.


This poisoning kills the interest of using a fuzzer. 99% of the 
crashes will be in hdrgen.


Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Sebastiaan Koppe via Digitalmars-d-announce

On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
Fuzzed [1] is a simple fuzzer for the D programming language. 
It allows to detect sequences of tokens that crash the parser. 
While the D front end is not yet used to make tools, if this 
ever happens the parser will have to accept invalid code. As 
experienced with dparse, invalid code tend to crash more a 
parser because of a cognitive bias that lead us, "hoomans", to 
prove that things work rather than the opposite.




Nice. In my experience fuzzing parses works very well. I have 
good memories with afl. So much so that I once wrote a wrapper 
around it to handle running it distributed.


See https://github.com/skoppe/afl-dist
Could use a readme and a how-to though.



Re: Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Johan Engelen via Digitalmars-d-announce

On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:

Fuzzed [1] is a simple fuzzer for the D programming language.


Are you familiar with libFuzzer and LDC's integration?
https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html
You can feed libFuzzer with a dictionary of keywords to speed up 
the initial fuzzing phase, where the keywords are the tokens 
strings that you use.
Besides finding crashes, it's also good to enable ASan to find 
memory-related bugs that by luck didn't crash the program.



The time to write this announce, already 5 "crashers" found.


Great :)

The other day I was reminded of OSS Fuzz and that it'd be nice if 
we would setup fuzzing for the frontend and phobos there...


-Johan




Fuzzed - a program to find DMDFE parser crash

2018-12-15 Thread Basile B. via Digitalmars-d-announce
Fuzzed [1] is a simple fuzzer for the D programming language. It 
allows to detect sequences of tokens that crash the parser. While 
the D front end is not yet used to make tools, if this ever 
happens the parser will have to accept invalid code. As 
experienced with dparse, invalid code tend to crash more a parser 
because of a cognitive bias that lead us, "hoomans", to prove 
that things work rather than the opposite.


You can run it on one your core, report the crasher programs to 
the project issue tracker or fix them yourself:



gdb dmd
run 
bt


And then try to see what happens in the parser at the location 
pointed on top of the back trace. Note that you'll need to build 
dmd debug version.


The time to write this announce, already 5 "crashers" found.

[1] https://github.com/BBasile/fuzzed