On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.

This is great news of course!

But I have some bad news ;-)
Now that the Lexer nicely separated, it is very easy for me to testdrive libFuzzer+AddressSanitizer on the lexer and... Expect many bug reports in the next days. I am testing this code:

```
void fuzzDMDLexer(const(char*) data, size_t length)
{
scope lexer = new Lexer("test", data, 0, length, false, false);
    lexer.nextToken;

    do  {
        auto drop = lexer.token.value;
    }
    while (lexer.nextToken != TOKeof);
}
```

A short list of heap-overflow memory access bugs (params data and length are consistent):
1. length == 0
2. data == "\n" (line feed, 0xa)
3. data == "only_ascii*" (nothing following the "*" is the problem)
4. data == "%%"
5. data == "*ô"
6. data == "\nÜÜÜ"
7. data == "\x0a''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''"
8. data == ")\xf7"

`void scan(Token* t)` is to blame for most of the bugs I found so far. See e.g. line 980 that causes bug 3:
https://github.com/dlang/dmd/blob/154aa1bfd36333a8777d571e39690511e670bfcf/src/ddmd/lexer.d#L979-L980

Example of stacktrace (bug 8):
```
==11222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000952 at pc 0x0001028915b5 bp 0x7fff5d3941f0 sp 0x7fff5d3941e8
READ of size 1 at 0x602000000952 thread T0
#0 0x1028915b4 in _D4ddmd5lexer5Lexer9decodeUTFMFZk lexer.d:2314 #1 0x102887cae in _D4ddmd5lexer5Lexer4scanMFPS4ddmd6tokens5TokenZv lexer.d:1019 #2 0x102875089 in _D4ddmd5lexer5Lexer9nextTokenMFZE4ddmd6tokens3TOK lexer.d:222 #3 0x1028c5d20 in _D9fuzzlexer12fuzzDMDLexerFxPhmZv fuzzlexer.d:31
 ```

I am very excited to see the fuzzer+asan working so nicely!
:-)
  Johan


Reply via email to