On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
During the dconf hackathon I set out to create a DUB package
for DMD to be used as a library. This has finally been merged
[1] and is available here [2]. It contains the lexer and the
parser.
This is great news of course!
But I have some bad news ;-)
Now that the Lexer nicely separated, it is very easy for me to
testdrive libFuzzer+AddressSanitizer on the lexer and... Expect
many bug reports in the next days. I am testing this code:
```
void fuzzDMDLexer(const(char*) data, size_t length)
{
scope lexer = new Lexer("test", data, 0, length, false,
false);
lexer.nextToken;
do {
auto drop = lexer.token.value;
}
while (lexer.nextToken != TOKeof);
}
```
A short list of heap-overflow memory access bugs (params data and
length are consistent):
1. length == 0
2. data == "\n" (line feed, 0xa)
3. data == "only_ascii*" (nothing following the "*" is the
problem)
4. data == "%%"
5. data == "*ô"
6. data == "\nÜÜÜ"
7. data ==
"\x0a''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''"
8. data == ")\xf7"
`void scan(Token* t)` is to blame for most of the bugs I found so
far. See e.g. line 980 that causes bug 3:
https://github.com/dlang/dmd/blob/154aa1bfd36333a8777d571e39690511e670bfcf/src/ddmd/lexer.d#L979-L980
Example of stacktrace (bug 8):
```
==11222==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x602000000952 at pc 0x0001028915b5 bp 0x7fff5d3941f0 sp
0x7fff5d3941e8
READ of size 1 at 0x602000000952 thread T0
#0 0x1028915b4 in _D4ddmd5lexer5Lexer9decodeUTFMFZk
lexer.d:2314
#1 0x102887cae in
_D4ddmd5lexer5Lexer4scanMFPS4ddmd6tokens5TokenZv lexer.d:1019
#2 0x102875089 in
_D4ddmd5lexer5Lexer9nextTokenMFZE4ddmd6tokens3TOK lexer.d:222
#3 0x1028c5d20 in _D9fuzzlexer12fuzzDMDLexerFxPhmZv
fuzzlexer.d:31
```
I am very excited to see the fuzzer+asan working so nicely!
:-)
Johan