http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38199
--- Comment #13 from Janne Blomqvist <jb at gcc dot gnu.org> 2011-05-10 09:41:08 UTC --- Here's something for formatted writes; consider the write-many.f (from some other PR, I'm too lazy to check which now) program main open(10,status='SCRATCH') a = 0.3858204 do i=1,1000000 a = a + 0.4761748164 write(10, '(G12.5)'),a end do end program main Profiling this with 'perf' shows the top offenders as # Overhead Command Shared Object Symbol # ........ .............. ................................................................. ...... # 21.56% write-many /lib/libc-2.11.1.so [.] __mpn_divrem 14.72% write-many /lib/libc-2.11.1.so [.] ___printf_fp 13.42% write-many /lib/libc-2.11.1.so [.] hack_digit.15661 7.75% write-many /lib/libc-2.11.1.so [.] __GI_vfprintf 3.81% write-many /home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.] output_float.isra.7.constprop.16 2.81% write-many /home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.] write_float 2.38% write-many /lib/libc-2.11.1.so [.] _IO_default_xsputn_internal 2.10% write-many /home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.] data_transfer_init 1.96% write-many /home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.] formatted_transfer 1.37% write-many /home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.] next_format0 That is, most of the time seems to be spent somewhere related to the libc formatting (as we're using snprintf to convert the real numbers to ascii). Next, consider #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int ndigits = atoi(argv[1]); printf("Doing test with %d digits\n", ndigits); size_t bufsz = ndigits + 9; char *buf = malloc(bufsz); for (int i = 0; i < 10000000; i++) snprintf(buf, bufsz, "%#-.*e", ndigits, 1./3); printf("%s\n", buf); return 0; } $ time ./snprintfbench 0 Doing test with 0 digits 3.e-01 real 0m2.608s user 0m2.610s sys 0m0.000s $ time ./snprintfbench 20 Doing test with 20 digits 3.33333333333333314830e-01 real 0m4.746s user 0m4.740s sys 0m0.010s $ time ./snprintfbench 40 Doing test with 40 digits 3.3333333333333331482961625624739099293947e-01 real 0m6.362s user 0m6.360s sys 0m0.000s $ time ./snprintfbench 60 Doing test with 60 digits 3.333333333333333148296162562473909929394721984863281250000000e-01 real 0m8.155s user 0m8.160s sys 0m0.000s That is, while there is a constant cost for snprintf(), each additional digit increases the time approximately linearly. Now, in io/write_float.def we always convert with a constant 41 digits (when REAL(16) is available). Instead, we could first figure out how many digits we need, and only then call snprintf(), generating only as many digits as needed. Or as many as requested + 1, if the user has chosen a non-default rounding mode, that is we need an extra digit in order to do the rounding in that case.