https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63466
--- Comment #2 from Andi Kleen <andi-gcc at firstfloor dot org> --- Looking at the profile there's plenty of room for optimization. e.g. not using getc/ungetc, but directly accessing the buffer, or maybe even some kind of template specialization. With the variables pulled out it's faster, but still a lot slower than C: % time ./a.out < testfile real 0m0.400s user 0m0.397s sys 0m0.002s % time ./tstream-c < testfile real 0m0.033s user 0m0.028s sys 0m0.004s