It's pretty easy to DoS a D program that uses File.readln or File.byLine:

msl@james:~/d$ prlimit --as=4000000000 time ./tinycat.d tinycat.d
#!/usr/bin/rdmd

import std.stdio;

void main(in string[] argv) {
    foreach (const filename; argv[1..$])
        foreach (line; File(filename).byLine)
            writeln(line);
}

0.00user 0.00system 0:00.00elapsed 66%CPU (0avgtext+0avgdata 4280maxresident)k
0inputs+0outputs (0major+292minor)pagefaults 0swaps
msl@james:~/d$ prlimit --as=4000000000 time ./tinycat.d /dev/zero
0.87user 1.45system 0:02.51elapsed 92%CPU (0avgtext+0avgdata 2100168maxresident)k
0inputs+0outputs (0major+524721minor)pagefaults 0swaps
msl@james:~/d$

This trivial program that runs in about 4MiB when asked to print itself chewed up 2GiB of memory in about three seconds when handed an infinitely long input line, and would have kept going if prlimit hadn't killed it.

D is in good company: C++'s getline() and Perl's diamond operator have the same vulnerability.

msl@james:~/d$ prlimit --as=4000000000 time ./a.out tinycat.cpp
#include <fstream>
#include <iostream>
#include <string>

int main(int const argc, char const *argv[]) {
    for (auto i = 1;  i < argc;  ++i) {
        std::ifstream fh {argv[i]};
        for (std::string line;  getline(fh, line, '\n');  )
            std::cout << line << '\n';
    }

    return 0;
}

0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 2652maxresident)k
0inputs+0outputs (0major+113minor)pagefaults 0swaps
msl@james:~/d$ prlimit --as=4000000000 time ./a.out /dev/zero
1.12user 1.76system 0:02.92elapsed 98%CPU (0avgtext+0avgdata 1575276maxresident)k
0inputs+0outputs (0major+786530minor)pagefaults 0swaps
msl@james:~/d$ prlimit --as=4000000000 time perl -wpe '' tinycat.d
#!/usr/bin/rdmd

import std.stdio;

void main(in string[] argv) {
    foreach (const filename; argv[1..$])
        foreach (line; File(filename).byLine)
            writeln(line);
}

0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 3908maxresident)k
0inputs+0outputs (0major+192minor)pagefaults 0swaps
msl@james:~/d$ prlimit --as=4000000000 time perl -wpe '' /dev/zero
Out of memory!
Command exited with non-zero status 1
4.82user 2.34system 0:07.43elapsed 96%CPU (0avgtext+0avgdata 3681400maxresident)k
0inputs+0outputs (0major+919578minor)pagefaults 0swaps
msl@james:~/d$

But I digress.

What would a safer API look like? Perhaps we'd slip in a maximum line length as an optional argument to readln, byLine and friends:

enum size_t MaxLength = 1 << 20;    // 1MiB
fh.readln(buf, MaxLength);
buf = fh.readln(MaxLength);
auto range = fh.byLine(MaxLength);

Obviously, we wouldn't want to break compatibility with existing code by demanding a maximum line length at every call site. Perhaps the default maximum length should change from its current value -- infinity -- to something like 4MiB: longer than lines in most text files, but still affordably small on most modern machines.

What should happen if readln encountered an excessively long line? Throw an exception?

Markus

Reply via email to