Re: [PATCH] Make integer output faster in libgfortran

2021-12-27 Thread FX via Gcc-patches
Follow-up patch committed, after my use of the one-argument variant of 
static_assert() broke bootstrap on Solaris (sorry Rainer!).
The one-arg form is new since C23, while Solaris  only supports the 
two-arg form (C11).

I have confirmed that other target libraries use the two-arg form, and 
bootstrapped the attached patch on x86_64-pc-linux-gnu.

FX



static_assert.diff
Description: Binary data


Re: [PATCH] Make integer output faster in libgfortran

2021-12-26 Thread Thomas Koenig via Gcc-patches

Hi FX,



(We could also do something like that for a 32-bit system, but
that is another kettle of fish).


We probably wouldn’t get a speed-up that big. Even on 32-bit targets
(at least common ones), the 64-bit type and its operations (notably
division) are implemented via CPU instructions, not library calls.


I'll look at this a bit more closely and report :-)


 At this point, the output of integers is probably bound by the
many layers of indirection of libgfortran's I/O system (which
are necessary because of the rich I/O formatting allowed by
the standard).


There are a few things we could do.  Getting a LTO-capable version
of libgfortran would be a huge step, because the compiler could
then strip out all of these layers.  The speed of

  character:: c


  write (*,'(A)', advance="no") c

could stand some improvement :-)

Regards

Thomas


Re: [PATCH] Make integer output faster in libgfortran

2021-12-26 Thread FX via Gcc-patches
Hi,

> I tested this on x86_64-pc-linux-gnu with
> make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"
> and didn't see any problems.

Thanks Thomas! Pushed.


> (We could also do something like that for a 32-bit system, but
> that is another kettle of fish).

We probably wouldn’t get a speed-up that big. Even on 32-bit targets (at least 
common ones), the 64-bit type and its operations (notably division) are 
implemented via CPU instructions, not library calls.

At this point, the output of integers is probably bound by the many layers of 
indirection of libgfortran's I/O system (which are necessary because of the 
rich I/O formatting allowed by the standard).

Best,
FX

Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread Thomas Koenig via Gcc-patches

Hi fX,


right now I don’t have a Linux system with 32-bit support. I’ll see how I can 
connect to gcc45, but if someone who is already set up to do can fire a quick 
regtest, that would be great;)


I tested this on x86_64-pc-linux-gnu with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

and didn't see any problems.

So, OK for trunk.

(We could also do something like that for a 32-bit system, but
that is another kettle of fish).

Thanks for taking this up!

Best regards

Thomas



Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread FX via Gcc-patches
Hi Thomas,

> There are two possibilities: Either use gcc45 on the compile farm, or
> run it with
> make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

Thanks, right now I don’t have a Linux system with 32-bit support. I’ll see how 
I can connect to gcc45, but if someone who is already set up to do can fire a 
quick regtest, that would be great ;)

FX

Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread Thomas Koenig via Gcc-patches

Hi FX,


The patch has been bootstrapped and regtested on two 64-bit targets: 
aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would 
like it to be tested on a 32-bit target without 128-bit integer type. Does 
someone have access to that?


There are two possibilities: Either use gcc45 on the compile farm, or
run it with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

which is the magic incantation to also use -m32 binaries.  You'll need
the 32-bit support on your Linux system, of course (which you can
check quickly with a "hello world" kind of program with -m32).

Regards

Thomas


[PATCH] Make integer output faster in libgfortran

2021-12-25 Thread FX via Gcc-patches
Hi,

Integer output in libgfortran is done by passing values as the largest integer 
type available. This is what our gfc_itoa() function for conversion to decimal 
form uses, as well, performing series of divisions by 10. On targets with a 
128-bit integer type (which is most targets, really, nowadays), division is 
slow, because it is implemented in software and requires a call to a libgcc 
function.

We can speed this up in two easy ways:
- If the value fits into 64-bit, use a simple 64-bit itoa() function, which 
does the series of divisions by 10 with hardware. Most I/O will actually fall 
into that case, in real-life, unless you’re printing very big 128-bit integers.
- If the value does not fit into 64-bit, perform only one slow division, by 
10^19, and use two calls to the 64-bit function to output each part (the low 
part needing zero-padding).


What is the speed-up? It really depends on the exact nature of the I/O done. 
For the most common-case, list-directed I/O with no special format, the patch 
does not speed (or slow!) things for values up to HUGE(KIND=4), but speeds 
things up for larger values. For very large 128-bit values, it can cut the I/O 
time in half.

I attach my own timing code to this email. Results before the patch (with 
previous itoa-patch applied, though):

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.191409990
 Value HUGE(KIND=1), time:  0.173687011
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.171809018
 Value 1049, time:  0.177439988
 Value HUGE(KIND=4), time:  0.217984974
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.178072989
 Value HUGE(KIND=4), time:  0.214841008
 Value HUGE(KIND=8), time:  0.276726007
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175235987
 Value HUGE(KIND=4), time:  0.217689037
 Value HUGE(KIND=8), time:  0.280257106
 Value HUGE(KIND=16), time:  0.420036077

Results after the patch:

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.194633007
 Value HUGE(KIND=1), time:  0.172436997
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.167517006
 Value 1049, time:  0.176503003
 Value HUGE(KIND=4), time:  0.172892988
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.171101034
 Value HUGE(KIND=4), time:  0.174461007
 Value HUGE(KIND=8), time:  0.180289030
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175765991
 Value HUGE(KIND=4), time:  0.181162953
 Value HUGE(KIND=8), time:  0.186082959
 Value HUGE(KIND=16), time:  0.207401991

Times are CPU times in seconds, for one million integer writes into a buffer 
string. With the patch, we see that integer decimal output is almost 
independent of the value written, meaning the I/O library overhead is dominant, 
not the decimal conversion. For this reason, I don’t think we really need a 
faster implementation of the 64-bit itoa, and can keep the current 
series-of-division-by-10 approach.

---

This patch applies on top of my previous itoa-related patch at 
https://gcc.gnu.org/pipermail/fortran/2021-December/057218.html

The patch has been bootstrapped and regtested on two 64-bit targets: 
aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would 
like it to be tested on a 32-bit target without 128-bit integer type. Does 
someone have access to that?

Once tested on a 32-bit target, OK to commit?

FX



itoa-faster.patch
Description: Binary data


timing.f90
Description: Binary data