[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #27 from Jerry DeLisle --- This should be fixed now fairly well. At this point there are not a lot of major issues in our own libgfortran library. I will leave this bug report open for a while if any issues arise. Here is my latest profile. On my machine I am actuall getting slightly better performance on trunk (8) vs. gfortran 6 Flat profile: Each sample counts as 0.01 seconds. % cumulative self time seconds seconds name 15.11 4.30 4.30 printf_positional 14.65 8.47 4.17 __printf_fp_l 7.13 10.50 2.03 __parse_one_specmb 6.08 12.23 1.73 build_float_string 4.16 13.42 1.19 formatted_transfer_scalar_write 2.39 14.10 0.68 next_format0 2.28 14.75 0.65 _IO_no_init 2.23 15.38 0.64 __mpn_mul_1 2.21 16.01 0.63 strtol_l_internal 2.18 16.63 0.62 vfprintf 1.86 17.16 0.53 __strchrnul_sse2 1.79 17.67 0.51 _gfortrani_next_format 1.77 18.18 0.51 _IO_str_init_static_internal 1.76 18.68 0.50 __mpn_extract_double 1.72 19.17 0.49 vsnprintf 1.72 19.66 0.49 _IO_default_xsputn 1.69 20.14 0.48 _int_malloc 1.62 20.60 0.46 __memmove_sse2_unaligned_erms 1.28 20.96 0.37 get_float_string 1.12 21.28 0.32 _IO_padn 1.02 21.57 0.29 __memset_sse2_unaligned 1.02 21.86 0.29 __wmemset_sse2_unaligned 0.91 22.12 0.26 flt128_ais
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #26 from Jerry DeLisle --- Author: jvdelisle Date: Thu Nov 23 17:19:18 2017 New Revision: 255108 URL: https://gcc.gnu.org/viewcvs?rev=255108&root=gcc&view=rev Log: 2017-11-23 Jerry DeLisle Backport from trunk PR libgfortran/78549 * io/io.h (newunit_free): Add declaration. Clean some whitespace. * io/transfer.c (st_read_done, st_write_done): Call newunit_free. * io/unit.c (newunit_free): Change type from static void to void. Modified: branches/gcc-7-branch/libgfortran/ChangeLog branches/gcc-7-branch/libgfortran/io/io.h branches/gcc-7-branch/libgfortran/io/transfer.c branches/gcc-7-branch/libgfortran/io/unit.c
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #25 from Jerry DeLisle --- Author: jvdelisle Date: Tue Nov 21 02:17:11 2017 New Revision: 254982 URL: https://gcc.gnu.org/viewcvs?rev=254982&root=gcc&view=rev Log: 2017-11-20 Jerry DeLisle PR libgfortran/78549 * io/io.h (newunit_free): Add declaration. Clean some whitespace. * io/transfer.c (st_read_done, st_write_done): Call newunit_free. * io/unit.c (newunit_free): Change type from static void to void. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/io.h trunk/libgfortran/io/transfer.c trunk/libgfortran/io/unit.c
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #24 from Jerry DeLisle --- After a minor test tweak. $ gfc -static pr78549.f $ time ./a.out real0m24.049s user0m22.941s sys 0m0.962s $ gfc6 -static pr78549.f $ time ./a.out real0m22.838s user0m22.718s sys 0m0.005s Without the tweak on trunk: $ gfc -static pr78549.f $ time ./a.out real0m28.916s user0m27.697s sys 0m1.024s With the minor tweak we are still a bit slower than 6, but then we added a few features since then also, so not too bad. My initial tweak is going to require a little logic to be added. Next report will be after I have that sorted out.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #23 from Jerry DeLisle --- Well, instrumenting a little bit I see that delete_root is getting called many many many times. So, every call to newunit_alloc is assigning a new unit number which is getting added to the treap and never reused in anyway. The result of this is that at the end of the program when any units left on the treap are deleted, we loop through over one million deletes. So I now have a better idea of what we need to do.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 Thomas Koenig changed: What|Removed |Added CC||manfred99 at gmx dot ch --- Comment #22 from Thomas Koenig --- *** Bug 82938 has been marked as a duplicate of this bug. ***
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #21 from Thomas Koenig --- Jerry, I think you know more about this than I do, so please go ahead.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #20 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #19) > Created attachment 42320 [details] > patch which failes with dtio_14 and dtio_15, among others > > Well, this one doesn't work yet. Do you want to continue with it or would you like me to pick up from where you are?
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #19 from Thomas Koenig --- Created attachment 42320 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42320&action=edit patch which failes with dtio_14 and dtio_15, among others Well, this one doesn't work yet.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #18 from Dominique d'Humieres --- I do not see any timing difference between 7.2.1 (r253502) and trunk.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #17 from Thomas Koenig --- I mean apart from the fact that this creates a huge memory leak :-)
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #16 from Thomas Koenig --- Jerry, what do you think of this approach? This creates a local copy of the gfc_unit, without putting it into the tree. Index: transfer.c === --- transfer.c (Revision 253377) +++ transfer.c (Arbeitskopie) @@ -4170,6 +4170,9 @@ st_write_done (st_parameter_dt *dtp) } unlock_unit (dtp->u.p.current_unit); } + if (is_internal_unit (dtp)) +free (dtp->u.p.current_unit); + library_end (); } Index: unit.c === --- unit.c (Revision 253377) +++ unit.c (Arbeitskopie) @@ -225,7 +225,7 @@ insert (gfc_unit *new, gfc_unit *t) /* insert_unit()-- Create a new node, insert it into the treap. */ static gfc_unit * -insert_unit (int n) +insert_unit (int n, int internal) { gfc_unit *u = xcalloc (1, sizeof (gfc_unit)); u->unit_number = n; @@ -238,8 +238,11 @@ static gfc_unit * __GTHREAD_MUTEX_INIT_FUNCTION (&u->lock); #endif __gthread_mutex_lock (&u->lock); - u->priority = pseudo_random (); - unit_root = insert (u, unit_root); + if (!internal) +{ + u->priority = pseudo_random (); + unit_root = insert (u, unit_root); +} return u; } @@ -316,14 +319,21 @@ delete_unit (gfc_unit *old) /* get_gfc_unit()-- Given an integer, return a pointer to the unit structure. Returns NULL if the unit does not exist, - otherwise returns a locked unit. */ + otherwise returns a locked unit. If internal, do not put it + in the unit cache. */ static gfc_unit * -get_gfc_unit (int n, int do_create) +get_gfc_unit (int n, int do_create, int internal) { gfc_unit *p; int c, created = 0; + if (internal) +{ + p = insert_unit (n, 1); + return p; +} + __gthread_mutex_lock (&unit_lock); retry: for (c = 0; c < CACHE_SIZE; c++) @@ -347,7 +357,7 @@ retry: if (p == NULL && do_create) { - p = insert_unit (n); + p = insert_unit (n, 0); created = 1; } @@ -405,7 +415,7 @@ found: gfc_unit * find_unit (int n) { - return get_gfc_unit (n, 0); + return get_gfc_unit (n, 0, 0); } @@ -412,7 +422,7 @@ find_unit (int n) gfc_unit * find_or_create_unit (int n) { - return get_gfc_unit (n, 1); + return get_gfc_unit (n, 1, 0); } @@ -548,7 +558,7 @@ get_unit (st_parameter_dt *dtp, int do_create) dtp->u.p.unit_is_internal = 1; dtp->common.unit = newunit_alloc (); - unit = get_gfc_unit (dtp->common.unit, do_create); + unit = get_gfc_unit (dtp->common.unit, do_create, 1); set_internal_unit (dtp, unit, kind); fbuf_init (unit, 128); return unit; @@ -563,9 +573,9 @@ get_unit (st_parameter_dt *dtp, int do_create) is not allowed, such units must be created with OPEN(NEWUNIT=...). */ if (dtp->common.unit < 0) -return get_gfc_unit (dtp->common.unit, 0); +return get_gfc_unit (dtp->common.unit, 0, 0); - return get_gfc_unit (dtp->common.unit, do_create); + return get_gfc_unit (dtp->common.unit, do_create, 0); } @@ -592,7 +602,7 @@ init_units (void) if (options.stdin_unit >= 0) { /* STDIN */ - u = insert_unit (options.stdin_unit); + u = insert_unit (options.stdin_unit, 0); u->s = input_stream (); u->flags.action = ACTION_READ; @@ -624,7 +634,7 @@ init_units (void) if (options.stdout_unit >= 0) { /* STDOUT */ - u = insert_unit (options.stdout_unit); + u = insert_unit (options.stdout_unit, 0); u->s = output_stream (); u->flags.action = ACTION_WRITE; @@ -655,7 +665,7 @@ init_units (void) if (options.stderr_unit >= 0) { /* STDERR */ - u = insert_unit (options.stderr_unit); + u = insert_unit (options.stderr_unit, 0); u->s = error_stream (); u->flags.action = ACTION_WRITE;
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #15 from Jerry DeLisle --- For my own baseline: gcc6: real 0m6.948s gcc7: real 0m9.906s gcc8: real 0m10.415s I backported removal of the caching mentioned in comment #14 to gcc7. The two should be identical except that gcc7 is a released branch. IIRC released branches have debugging code removed. I do not know if this accounts for the difference between gcc7 and gcc8.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 --- Comment #14 from Jerry DeLisle --- The real issue is that to support DTIO with internal units I had to actually use a gfc_unit structure. Before DTIO we never did this. At the time of doing DTIO I did not have a 'better idea' since by the nature of it, DTIO passes a UNIT to the child procedures. I do have something percolating in my head on this that will be better. I have not finished thinking it through yet. I have to do some factoring on the structures. (In reply to Dominique d'Humieres from comment #9) > Gfortran is even slower now (~9.5s), likely revision r251374 (pr78387). This is because the gfc_unit caching scheme I tried before was incompatble with large numbers of threads. I decided that whole approach to the issue was a rabbit down a rabbit hole. I removed it, and we are slower. Working on it.
[Bug libfortran/78549] [8 Regression] Very slow formatted internal file output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 Thomas Koenig changed: What|Removed |Added Status|WAITING |NEW Target Milestone|--- |8.0 Summary|Very slow formatted |[8 Regression] Very slow |internal file output|formatted internal file ||output --- Comment #13 from Thomas Koenig --- Here is the timing for gcc 6: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 32.15 0.99 0.99 output_float 12.34 1.37 0.38 formatted_transfer 11.04 1.71 0.34 next_format0 7.14 1.93 0.22 write_float 3.73 2.05 0.12 _gfortrani_write_block 3.25 2.15 0.10 next_char 2.92 2.24 0.09 data_transfer_init 2.60 2.32 0.08 _gfortrani_next_format 2.60 2.40 0.08 get_fnode 2.60 2.48 0.08 write_integer 2.27 2.55 0.07 _gfortrani_parse_format 2.27 2.62 0.07 finalize_transfer 2.27 2.69 0.07 parse_format_list 1.95 2.75 0.06 _gfortrani_next_record 1.14 2.78 0.04 _gfortran_transfer_array 0.97 2.81 0.03 _gfortrani_mem_alloc_r4 So, this does appear to be a regression for gcc8 at least. The time is also longer (~20 s for trunk, ~14 s for gcc 6).