Re: [PATCH] PR 78534 Change character length from int to size_t
On 1/3/18 11:43 AM, Janne Blomqvist wrote: On Wed, Jan 3, 2018 at 8:34 PM, Bob Deen wrote: On 12/29/17 5:31 AM, Janne Blomqvist wrote: In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. Did this change not make it into gcc 7 then? No, it caused regressions on big endian targets, and it was so late in the gcc 7 cycle that there was no time to fix them (at the time I didn't have access to a big endian target to test on, and getting said access took time), so I had to revert it. Those regressions have now been fixed, and the ABI has been broken anyway due to other changes, so I'm trying again for gcc 8. Okay. If for some reason it doesn't make 8, it would be nice to know. I lurk here and try to pay attention but obviously I missed that it was pulled from 7. (which is why I'd prefer something specific to test rather than a version number, but it is what it is). Thanks... -Bob I am one of those who still use these hidden arguments for Fortran <-> C interfaces. Based on discussions a year ago, I added this to my code: #if defined(__GNUC__) && (__GNUC__ > 6) #include #define FORSTR_STDARG_TYPE size_t #else #define FORSTR_STDARG_TYPE int #endif I infer from this thread that I should change this to __GNUC__ > 7 now. Is this still the correct/best way to determine the hidden argument size? Yes, I would say so. (note that we're still using 4.x... ugh, don't ask... so the >6 check hasn't actually been used yet, I just want to future-proof things as much as possible without having to rewrite the entire Fortran <-> C interface.) Thanks... -Bob Bob Deen @ NASA-JPL Multmission Image Processing Lab bob.d...@jpl.nasa.gov
Re: [PATCH] PR 78534 Change character length from int to size_t
On Thu, Jan 4, 2018 at 4:21 AM, Jerry DeLisle wrote: > On 01/03/2018 03:37 AM, Janne Blomqvist wrote: >> On Sat, Dec 30, 2017 at 10:58 PM, Jerry DeLisle >> wrote: >>> On 12/30/2017 12:35 PM, Janne Blomqvist wrote: On Sat, Dec 30, 2017 at 7:16 PM, Thomas Koenig wrote: >>> ---snip--- I can provide that stuff as a separate patch, or merge it into the original megapatch and resubmit that, whichever way you prefer. >>> >>> I would prefer we split into two patches. This will make review of the >>> library >>> I/O changes easier. The int len is used in a lot of places also where it >>> really >>> happens to also be the kind (which is a length in our implementation). >> >> Hi, >> >> attached is a patch that makes the two attached testcases work. It >> applies on top of the charlen->size_t patch. In the formatted I/O >> stuff, I have mostly used ptrdiff_t to avoid having to deal with >> signed/unsigned issues, as the previous code was using int. >> >> >> > > Well I have started reviewing. One concern I have is that in many places you > are changing formatted field widths, like w in read_l, to ptrdiff_t. I don't > think this is necessary. If someone is printing a value that has a field width > that big, it will never finish. > > I did not check the definitions of format data structures that use these > values > all over. So I think in the least, the patch goes too far. I don't expect to support format widths larger than INT_MAX. The reason why ptrdiff_t is used is that read_block_form_* and many similar functions take a pointer to a ptrdiff_t (to handle the one case where a scalar variable can be wider than INT_MAX, namely a character), and we can't pass a pointer to an incompatible type. Now, you could argue that the internal API should be refactored so that we pass integers by value rather than as pointers, and then only the character case would need to care to use a larger type and the rest could keep using plain int. And I would agree that would be good, but I think such a refactor would be stage 1 material, not stage 3. > Another issue I have is that the readability of the code just sucks to me. The > type ptrdiff_t is so not intuitive in a very many places. If this is really > necessary lets hide it behind a macro or something so I don't have to cringe > every time I look at this code. (such as gfc_int) > > Sometimes we can get too pedantic about things like this. A lot of these > variables have nothing to do with pointers, though they may be used as > modifiers > to offsets in large memory spaces. Well, ptrdiff_t is one of the two "pointer-sized integer" typedefs introduced in C89, the other being size_t. So I would argue that if one is doing stuff like pointer arithmetic, or calculating array indices etc., and the variable might for some reason be negative, ptrdiff_t is the natural type to use. I do agree that ptrdiff_t is a bit of a mouthful, but I don't have any really good alternatives either; I'm not convinced that a libgfortran-specific typedef would make things clearer. Or we can blame Microsoft which made win64 LLP64 and not LP64 like the rest of the world, and thus spoiled the use of long for portable code. :) FWIW, we have "index_type" which is a typedef for ptrdiff_t, but that's used mostly in code that deals with array descriptors, so I thought it would be cleaner to not use it here. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On 01/03/2018 03:37 AM, Janne Blomqvist wrote: > On Sat, Dec 30, 2017 at 10:58 PM, Jerry DeLisle wrote: >> On 12/30/2017 12:35 PM, Janne Blomqvist wrote: >>> On Sat, Dec 30, 2017 at 7:16 PM, Thomas Koenig >>> wrote: >> ---snip--- >>> >>> I can provide that stuff as a separate patch, or merge it into the >>> original megapatch and resubmit that, whichever way you prefer. >> >> I would prefer we split into two patches. This will make review of the >> library >> I/O changes easier. The int len is used in a lot of places also where it >> really >> happens to also be the kind (which is a length in our implementation). > > Hi, > > attached is a patch that makes the two attached testcases work. It > applies on top of the charlen->size_t patch. In the formatted I/O > stuff, I have mostly used ptrdiff_t to avoid having to deal with > signed/unsigned issues, as the previous code was using int. > > > Well I have started reviewing. One concern I have is that in many places you are changing formatted field widths, like w in read_l, to ptrdiff_t. I don't think this is necessary. If someone is printing a value that has a field width that big, it will never finish. I did not check the definitions of format data structures that use these values all over. So I think in the least, the patch goes too far. Another issue I have is that the readability of the code just sucks to me. The type ptrdiff_t is so not intuitive in a very many places. If this is really necessary lets hide it behind a macro or something so I don't have to cringe every time I look at this code. (such as gfc_int) Sometimes we can get too pedantic about things like this. A lot of these variables have nothing to do with pointers, though they may be used as modifiers to offsets in large memory spaces. I need a day or two to go through all of this. Sorry if I am just a downer about it. Regards, Jerry
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Jan 3, 2018 at 8:34 PM, Bob Deen wrote: > On 12/29/17 5:31 AM, Janne Blomqvist wrote: >> >> In order to handle large character lengths on (L)LP64 targets, switch >> the GFortran character length from an int to a size_t. >> >> This is an ABI change, as procedures with character arguments take >> hidden arguments with the character length. > > > Did this change not make it into gcc 7 then? No, it caused regressions on big endian targets, and it was so late in the gcc 7 cycle that there was no time to fix them (at the time I didn't have access to a big endian target to test on, and getting said access took time), so I had to revert it. Those regressions have now been fixed, and the ABI has been broken anyway due to other changes, so I'm trying again for gcc 8. > I am one of those who still > use these hidden arguments for Fortran <-> C interfaces. Based on > discussions a year ago, I added this to my code: > > #if defined(__GNUC__) && (__GNUC__ > 6) > #include > #define FORSTR_STDARG_TYPE size_t > #else > #define FORSTR_STDARG_TYPE int > #endif > > I infer from this thread that I should change this to __GNUC__ > 7 now. Is > this still the correct/best way to determine the hidden argument size? Yes, I would say so. > (note that we're still using 4.x... ugh, don't ask... so the >6 check hasn't > actually been used yet, I just want to future-proof things as much as > possible without having to rewrite the entire Fortran <-> C interface.) > > Thanks... > > -Bob > > Bob Deen @ NASA-JPL Multmission Image Processing Lab > bob.d...@jpl.nasa.gov > -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On 12/29/17 5:31 AM, Janne Blomqvist wrote: In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. Did this change not make it into gcc 7 then? I am one of those who still use these hidden arguments for Fortran <-> C interfaces. Based on discussions a year ago, I added this to my code: #if defined(__GNUC__) && (__GNUC__ > 6) #include #define FORSTR_STDARG_TYPE size_t #else #define FORSTR_STDARG_TYPE int #endif I infer from this thread that I should change this to __GNUC__ > 7 now. Is this still the correct/best way to determine the hidden argument size? (note that we're still using 4.x... ugh, don't ask... so the >6 check hasn't actually been used yet, I just want to future-proof things as much as possible without having to rewrite the entire Fortran <-> C interface.) Thanks... -Bob Bob Deen @ NASA-JPL Multmission Image Processing Lab bob.d...@jpl.nasa.gov
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Jan 3, 2018 at 2:10 PM, Thomas Koenig wrote: > Hi Janne, > >> attached is a patch that makes the two attached testcases work. It >> applies on top of the charlen->size_t patch. In the formatted I/O >> stuff, I have mostly used ptrdiff_t to avoid having to deal with >> signed/unsigned issues, as the previous code was using int. > > > Did you regression-test? Ah yes, I forgot to mention that. Yes, I did, though only on x86_64-linux-gnu > If yes, I'd say this patch is OK (all the changes look obvious > enough). > > With this, your character length patch is also OK. We can then > open individual PRs for the other issues. > > However, I'd ask you to wait for a day or so with committing > so that other people also have a chance for a (final) look > at this. Thanks! Dominique mentioned on IRC about some incoming comments, so I guess it makes sense to wait a few days. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, attached is a patch that makes the two attached testcases work. It applies on top of the charlen->size_t patch. In the formatted I/O stuff, I have mostly used ptrdiff_t to avoid having to deal with signed/unsigned issues, as the previous code was using int. Did you regression-test? If yes, I'd say this patch is OK (all the changes look obvious enough). With this, your character length patch is also OK. We can then open individual PRs for the other issues. However, I'd ask you to wait for a day or so with committing so that other people also have a chance for a (final) look at this. Regards Thomas
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sat, Dec 30, 2017 at 10:58 PM, Jerry DeLisle wrote: > On 12/30/2017 12:35 PM, Janne Blomqvist wrote: >> On Sat, Dec 30, 2017 at 7:16 PM, Thomas Koenig wrote: > ---snip--- >> >> I can provide that stuff as a separate patch, or merge it into the >> original megapatch and resubmit that, whichever way you prefer. > > I would prefer we split into two patches. This will make review of the library > I/O changes easier. The int len is used in a lot of places also where it > really > happens to also be the kind (which is a length in our implementation). Hi, attached is a patch that makes the two attached testcases work. It applies on top of the charlen->size_t patch. In the formatted I/O stuff, I have mostly used ptrdiff_t to avoid having to deal with signed/unsigned issues, as the previous code was using int. -- Janne Blomqvist From 72799c5587ee1e830bb424278286cff309d0c4a7 Mon Sep 17 00:00:00 2001 From: Janne Blomqvist Date: Tue, 2 Jan 2018 14:17:57 +0200 Subject: [PATCH] Use ptrdiff_t for formatted I/O sizes --- libgfortran/io/fbuf.c | 44 ++--- libgfortran/io/fbuf.h | 16 +-- libgfortran/io/io.h| 16 +-- libgfortran/io/list_read.c | 15 +- libgfortran/io/read.c | 70 +++--- libgfortran/io/transfer.c | 36 libgfortran/io/unix.c | 20 ++--- libgfortran/io/unix.h | 12 libgfortran/io/write.c | 21 +++--- 9 files changed, 126 insertions(+), 124 deletions(-) diff --git a/libgfortran/io/fbuf.c b/libgfortran/io/fbuf.c index 944469d..d38d003 100644 --- a/libgfortran/io/fbuf.c +++ b/libgfortran/io/fbuf.c @@ -33,7 +33,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see void -fbuf_init (gfc_unit *u, int len) +fbuf_init (gfc_unit *u, ptrdiff_t len) { if (len == 0) len = 512; /* Default size. */ @@ -64,9 +64,9 @@ fbuf_debug (gfc_unit *u, const char *format, ...) va_start(args, format); vfprintf(stderr, format, args); va_end(args); - fprintf (stderr, "fbuf_debug pos: %d, act: %d, buf: ''", - u->fbuf->pos, u->fbuf->act); - for (int ii = 0; ii < u->fbuf->act; ii++) + fprintf (stderr, "fbuf_debug pos: %ld, act: %ld, buf: ''", + (long) u->fbuf->pos, (long) u->fbuf->act); + for (ptrdiff_t ii = 0; ii < u->fbuf->act; ii++) { putc (u->fbuf->buf[ii], stderr); } @@ -84,10 +84,10 @@ fbuf_debug (gfc_unit *u __attribute__ ((unused)), underlying device. Returns how much the physical position was modified. */ -int +ptrdiff_t fbuf_reset (gfc_unit *u) { - int seekval = 0; + ptrdiff_t seekval = 0; if (!u->fbuf) return 0; @@ -99,7 +99,7 @@ fbuf_reset (gfc_unit *u) if (u->mode == READING && u->fbuf->act > u->fbuf->pos) { seekval = - (u->fbuf->act - u->fbuf->pos); - fbuf_debug (u, "fbuf_reset seekval %d, ", seekval); + fbuf_debug (u, "fbuf_reset seekval %ld, ", (long) seekval); } u->fbuf->act = u->fbuf->pos = 0; return seekval; @@ -111,11 +111,11 @@ fbuf_reset (gfc_unit *u) reallocating if necessary. */ char * -fbuf_alloc (gfc_unit *u, int len) +fbuf_alloc (gfc_unit *u, ptrdiff_t len) { - int newlen; + ptrdiff_t newlen; char *dest; - fbuf_debug (u, "fbuf_alloc len %d, ", len); + fbuf_debug (u, "fbuf_alloc len %ld, ", (long) len); if (u->fbuf->pos + len > u->fbuf->len) { /* Round up to nearest multiple of the current buffer length. */ @@ -138,7 +138,7 @@ fbuf_alloc (gfc_unit *u, int len) int fbuf_flush (gfc_unit *u, unit_mode mode) { - int nwritten; + ptrdiff_t nwritten; if (!u->fbuf) return 0; @@ -177,7 +177,7 @@ fbuf_flush (gfc_unit *u, unit_mode mode) int fbuf_flush_list (gfc_unit *u, unit_mode mode) { - int nwritten; + ptrdiff_t nwritten; if (!u->fbuf) return 0; @@ -206,8 +206,8 @@ fbuf_flush_list (gfc_unit *u, unit_mode mode) } -int -fbuf_seek (gfc_unit *u, int off, int whence) +ptrdiff_t +fbuf_seek (gfc_unit *u, ptrdiff_t off, int whence) { if (!u->fbuf) return -1; @@ -226,7 +226,7 @@ fbuf_seek (gfc_unit *u, int off, int whence) return -1; } - fbuf_debug (u, "fbuf_seek, off %d ", off); + fbuf_debug (u, "fbuf_seek, off %ld ", (long) off); /* The start of the buffer is always equal to the left tab limit. Moving to the left past the buffer is illegal in C and would also imply moving past the left tab limit, which is never @@ -248,21 +248,21 @@ fbuf_seek (gfc_unit *u, int off, int whence) of bytes actually processed. */ char * -fbuf_read (gfc_unit *u, int *len) +fbuf_read (gfc_unit *u, ptrdiff_t *len) { char *ptr; - int oldact, oldpos; - int readlen = 0; + ptrdiff_t oldact, oldpos; + ptrdiff_t readlen = 0; - fbuf_debug (u, "fbuf_read, len %d: ", *len); + fbuf_debug (u, "fbuf_read, len %ld: ", (long) *len); oldact = u->fbuf->act; oldpos = u->fbuf->pos; ptr = fbuf_alloc (u, *l
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, FWIW, by changing your example to use unformatted I/O, it works correctly. Not for me (again, on gcc110): program main character(len=2_8**33), parameter :: a = "" write (10) a end program main with strace results in open("fort.10", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 3 fstat(3, {st_mode=S_IFREG|0664, st_size=8, ...}) = 0 write(3, "\0\0\0\0\0\0\0\0", 8) = 8 ftruncate(3, 8) = 0 close(3)= 0 Regards Thomas
Re: [PATCH] PR 78534 Change character length from int to size_t
On 12/30/2017 12:35 PM, Janne Blomqvist wrote: > On Sat, Dec 30, 2017 at 7:16 PM, Thomas Koenig wrote: ---snip--- > > I can provide that stuff as a separate patch, or merge it into the > original megapatch and resubmit that, whichever way you prefer. I would prefer we split into two patches. This will make review of the library I/O changes easier. The int len is used in a lot of places also where it really happens to also be the kind (which is a length in our implementation). Janne I trust your judgment about where it makes sense to change to size_t. On Thomas earlier comment about adding a pointer to void to hold space for async. I am OK with this too, though I am not convinced we have really defined our async objective here. More later on that topic. Regards, Jerry
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sat, Dec 30, 2017 at 7:16 PM, Thomas Koenig wrote: > Hi Janne, > >> To be honest, I haven't really done much testing with big strings, so >> far my focus has been on getting the existing testsuite to pass and >> getting the ABI right. > > > I agree that some of the test cases can be fixed later. However, I > would really prefer if the I/O worked, because that is very basic > functionality, but also because this is (likely to be) an ABI issue. > If this bug remains unfixed for any reason at all, then we are left with > no choice but to break the ABI when we fix that bug. I would like to > avoid that, if possible. Fair enough. I took a look at the I/O example you provided, and at least that particular case is not due to an ABI issue, but rather that the formatted I/O stuff inside libgfortran extensively uses int for lengths. I managed to hack around it quickly to make your testcase work, but a proper fix, while straightforward, implies fixing up the types a bit more. But still only in the internals, the external ABI visible interface is ok. I can provide that stuff as a separate patch, or merge it into the original megapatch and resubmit that, whichever way you prefer. FWIW, by changing your example to use unformatted I/O, it works correctly. And as unformatted uses the same library entry points for transferring the data, this is thus further evidence that the problem is in the internals of the formatted I/O rather than in the ABI. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, To be honest, I haven't really done much testing with big strings, so far my focus has been on getting the existing testsuite to pass and getting the ABI right. I agree that some of the test cases can be fixed later. However, I would really prefer if the I/O worked, because that is very basic functionality, but also because this is (likely to be) an ABI issue. If this bug remains unfixed for any reason at all, then we are left with no choice but to break the ABI when we fix that bug. I would like to avoid that, if possible. By the way, we also should forsee a few more ABI-breaking things while we're at it. We should - Put a "reserved" field in the array descriptor for things like "did this come from an ALLOCATE statement or not", there is a PR for this - Put a pointer to void into the I/O structures, which we are certain to need for async I/O - Increase the maximum number of array dimensions to 15, as per f2008 - Insert a "BACK" argument in minloc, maxloc, minval, maxval, even if we do not use it at the moment As the library ABI has been broken for GCC 8 already by other changes, I'd like to piggyback this ABI change in for the GCC 8 release as well. As the patch is already pretty big as is, I'd prefer that other fixes to enable big strings would be done as separate patches rather than trying to make everything perfect on the first try. I tend to concur for the other bugs, but not for the I/O issue. Regards Thomas
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sat, Dec 30, 2017 at 4:59 PM, Thomas Koenig wrote: > That's all I could find for the moment. I will continue looking. > Thanks for tackling this! Thanks for testing! To be honest, I haven't really done much testing with big strings, so far my focus has been on getting the existing testsuite to pass and getting the ABI right. As the library ABI has been broken for GCC 8 already by other changes, I'd like to piggyback this ABI change in for the GCC 8 release as well. As the patch is already pretty big as is, I'd prefer that other fixes to enable big strings would be done as separate patches rather than trying to make everything perfect on the first try. Slightly related to your testcases, I think it would make sense to create some separate "GFortran huge" testsuite, where we could collect testcases that need lots of memory, or cpu time, or disk space and can thus not be part of the default testsuite. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. Just running some tests on gcc110 (the big-endian PPC machine from the compile farm). Something does not seem to work with I/O with long strings. With program main character(len=2_8**33), parameter :: a = "" character(len=2_8**33) :: b print '(A),a b = "" print '(A)',b end program main I get $ strace ./a.out > /dev/null ... write(1, "\n", 1) = 1 write(1, "\n", 1) = 1 so I suspect there still is a 32-bit quantity for string lengths lurking somewhere in the I/O library. The following program causes a segfault on compilation: program main integer(8), parameter :: n=2_8**32 character(len=*), parameter :: a1 = repeat('x',n) end program main The program program main integer(8), parameter :: n=2_8**32 character(len=n), parameter :: a1 = repeat('x',n), a2 = repeat('y',n) character(len=*), parameter :: a3 = a1 // a2 end program main is rejected with tkoenig@gcc1-power7 String]$ gfortran concat.f90 concat.f90:4:37: character(len=*), parameter :: a3 = a1 // a2 1 Error: Function ‘a1’ in initialization expression at (1) must be an intrinsic function That's all I could find for the moment. I will continue looking. Thanks for tackling this! Regards Thomas
[PATCH] PR 78534 Change character length from int to size_t
In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. I also changed the _size member in vtables from int to size_t, as there were some cases where character lengths and sizes were apparently mixed up and caused regressions otherwise. Although I haven't tested, this might enable very large derived types as well. Also, as there are some places in the frontend were negative character lengths are used as special flag values, in the frontend the character length is handled as a signed variable of the same size as a size_t, although in the runtime library it really is size_t. I haven't changed the character length variables for the co-array intrinsics, as this is something that may need to be synchronized with OpenCoarrays. This is v5 of the patch. v4 was applied but caused breakage on big endian targets. These have been fixed and tested, thanks to access to the GCC compile farm. Overview of v4 of the patch: v3 was applied but had to reverted due to breaking bootstrap. The fix is in resolve.c:resolve_charlen, where it's necessary to check that an expression is constant before using mpz_sgn. Overview of v3 of the patch: All the issues pointed out by FX's review of v2 have been fixed. In particular, there are now new functions gfc_mpz_get_hwi and gfc_mpz_set_hwi, similar to the GMP functions mpz_get_si and mpz_set_si, except that they get/set a HOST_WIDE_INT instead of a long value. Similarly, gfc_get_int_expr now takes a HOST_WIDE_INT instead of a long, gfc_extract_long is replaced by gfc_extract_hwi. Also, the preliminary work to handle gfc_charlen_type_node being unsigned has been removed. Regtested on x86_64-pc-linux-gnu, i686-pc-linux-gnu and powerpc64-unknown-linux-gnu. Also regtested all three targets by modifying gfortran-dg.exp to also test with "-g -flto", no new failures observed. frontend: 2017-12-29 Janne Blomqvist PR fortran/78534 PR fortran/66310 * array.c (got_charlen): Use gfc_charlen_int_kind. * class.c (gfc_find_derived_vtab): Use gfc_size_kind instead of hardcoded kind. (find_intrinsic_vtab): Likewise. * decl.c (match_char_length): Use gfc_charlen_int_kind. (add_init_expr_to_sym): Use gfc_charlen_t and gfc_charlen_int_kind. (gfc_match_implicit): Use gfc_charlen_int_kind. * dump-parse-tree.c (show_char_const): Use gfc_charlen_t and size_t. (show_expr): Use HOST_WIDE_INT_PRINT_DEC. * expr.c (gfc_get_character_expr): Length parameter of type gfc_charlen_t. (gfc_get_int_expr): Value argument of type HOST_WIDE_INT. (gfc_extract_hwi): New function. (simplify_const_ref): Make string_len of type gfc_charlen_t. (gfc_simplify_expr): Use HOST_WIDE_INT for substring refs. * frontend-passes.c (optimize_trim): Use gfc_charlen_int_kind. * gfortran.h (gfc_mpz_get_hwi): New prototype. (gfc_mpz_set_hwi): Likewise. (gfc_charlen_t): New typedef. (gfc_expr): Use gfc_charlen_t for character lengths. (gfc_size_kind): New extern variable. (gfc_extract_hwi): New prototype. (gfc_get_character_expr): Use gfc_charlen_t for character length. (gfc_get_int_expr): Use HOST_WIDE_INT type for value argument. * gfortran.texi: Update description of hidden string length argument. * iresolve.c (check_charlen_present): Use gfc_charlen_int_kind. (gfc_resolve_char_achar): Likewise. (gfc_resolve_repeat): Pass string length directly without temporary, use gfc_charlen_int_kind. (gfc_resolve_transfer): Use gfc_charlen_int_kind. * match.c (select_intrinsic_set_tmp): Use HOST_WIDE_INT for charlen. * misc.c (gfc_mpz_get_hwi): New function. (gfc_mpz_set_hwi): New function. * module.c (atom_int): Change type from int to HOST_WIDE_INT. (parse_integer): Don't complain about large integers. (write_atom): Use HOST_WIDE_INT for integers. (mio_integer): Handle integer type mismatch. (mio_hwi): New function. (mio_intrinsic_op): Use HOST_WIDE_INT. (mio_array_ref): Likewise. (mio_expr): Likewise. * primary.c (match_substring): Use gfc_charlen_int_kind. * resolve.c (resolve_substring_charlen): Use gfc_charlen_int_kind. (resolve_character_operator): Likewise. (resolve_assoc_var): Likewise. (resolve_select_type): Use HOST_WIDE_INT for charlen, use snprintf. (resolve_charlen): Use mpz_sgn to determine sign. * simplify.c (gfc_simplify_repeat): Use HOST_WIDE_INT/gfc_charlen_t instead of long. * symbol.c (generate_isocbinding_symbol): Use gfc_charlen_int_kind. * target-memory.c (size_character): Length argument of type gfc_charlen_t.
Re: [PATCH] PR 78534 Change character length from int to size_t
On 01/14/2017 12:46 AM, Janne Blomqvist wrote: > On Sat, Jan 14, 2017 at 1:13 AM, Jerry DeLisle wrote: >> On 01/13/2017 11:46 AM, David Edelsohn wrote: >>> On Fri, Jan 13, 2017 at 12:09 PM, Janne Blomqvist >>> wrote: On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: >> I was finally able to get a 32-bit i686 compiler going (my attempts to >> do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to >> running 32-bit builds/tests on a i686 container). At least on i686, >> the patch below on top of the big charlen->size_t patch fixes the >> failures > > Patch approved. The old code used gfc_extract_int, which bails out if a > non-constant expression is passed, so this is the right thing to do. > > FX Committed as r28. David, can you verify that it works alright on ppc? >>> >>> Unfortunately, no. This patch broke bootstrap again with the same >>> failure as earlier. >>> >>> - David >>> >> >> Janne, can you access gcc112 on the compile farm? If not, I can, maybe I can >> test the patch. >> >> Jerry > > No, I don't have such access. I'd appreciate it very much if you'd > have a go at it. Attached is the r28 patch combined with the > patch to fix the sanitizer bugs found by Dominique. > I will try to do this tonight Jerry
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sat, Jan 14, 2017 at 10:46 AM, Janne Blomqvist wrote: > On Sat, Jan 14, 2017 at 1:13 AM, Jerry DeLisle wrote: >> On 01/13/2017 11:46 AM, David Edelsohn wrote: >>> On Fri, Jan 13, 2017 at 12:09 PM, Janne Blomqvist >>> wrote: On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: >> I was finally able to get a 32-bit i686 compiler going (my attempts to >> do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to >> running 32-bit builds/tests on a i686 container). At least on i686, >> the patch below on top of the big charlen->size_t patch fixes the >> failures > > Patch approved. The old code used gfc_extract_int, which bails out if a > non-constant expression is passed, so this is the right thing to do. > > FX Committed as r28. David, can you verify that it works alright on ppc? >>> >>> Unfortunately, no. This patch broke bootstrap again with the same >>> failure as earlier. >>> >>> - David >>> >> >> Janne, can you access gcc112 on the compile farm? If not, I can, maybe I can >> test the patch. >> >> Jerry > > No, I don't have such access. I'd appreciate it very much if you'd > have a go at it. Attached is the r28 patch combined with the > patch to fix the sanitizer bugs found by Dominique. > > -- > Janne Blomqvist Since David had problems with module generation, I took a look at the parts of the patch that changes module.c, and found a few places for improvements. module.c is a collection of all the ugly type punning tricks one can think of, in particular wrongly punning to a wider type might cause a different result on a big endian machine, whereas on a little endian it would apparently just work.. So could you also try the attached small patch on top of the previous one? (I also sent Laurent Guerby an email requesting an account on the compile farm). -- Janne Blomqvist diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index ca53016..90b77ca 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -143,7 +143,7 @@ enum gfc_wsym_state typedef struct pointer_info { BBT_HEADER (pointer_info); - int integer; + HOST_WIDE_INT integer; pointer_t type; /* The first component of each member of the union is the pointer @@ -372,7 +372,7 @@ get_pointer (void *gp) creating the node if not found. */ static pointer_info * -get_integer (int integer) +get_integer (HOST_WIDE_INT integer) { pointer_info *p, t; int c; @@ -472,7 +472,7 @@ associate_integer_pointer (pointer_info *p, void *gp) sometime later. Returns the pointer_info structure. */ static pointer_info * -add_fixup (int integer, void *gp) +add_fixup (HOST_WIDE_INT integer, void *gp) { pointer_info *p; fixup_t *f; @@ -2729,7 +2729,8 @@ mio_pointer_ref (void *gp) if (iomode == IO_OUTPUT) { p = get_pointer (*((char **) gp)); - write_atom (ATOM_INTEGER, &p->integer); + HOST_WIDE_INT hwi = p->integer; + write_atom (ATOM_INTEGER, &hwi); } else { @@ -2766,18 +2767,18 @@ static void mio_component (gfc_component *c, int vtype) { pointer_info *p; - int n; mio_lparen (); if (iomode == IO_OUTPUT) { p = get_pointer (c); - mio_integer (&p->integer); + mio_hwi (&p->integer); } else { - mio_integer (&n); + HOST_WIDE_INT n; + mio_hwi (&n); p = get_integer (n); associate_integer_pointer (p, c); } @@ -5924,7 +5925,7 @@ write_symtree (gfc_symtree *st) mio_pool_string (&st->name); mio_integer (&st->ambiguous); - mio_integer (&p->integer); + mio_hwi (&p->integer); }
Re: [PATCH] PR 78534 Change character length from int to size_t
> The following patch fixes these issues for me, does it work for you? Yes, it does! Dominique > Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sat, Jan 14, 2017 at 1:13 AM, Jerry DeLisle wrote: > On 01/13/2017 11:46 AM, David Edelsohn wrote: >> On Fri, Jan 13, 2017 at 12:09 PM, Janne Blomqvist >> wrote: >>> On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: > I was finally able to get a 32-bit i686 compiler going (my attempts to > do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to > running 32-bit builds/tests on a i686 container). At least on i686, > the patch below on top of the big charlen->size_t patch fixes the > failures Patch approved. The old code used gfc_extract_int, which bails out if a non-constant expression is passed, so this is the right thing to do. FX >>> >>> Committed as r28. >>> >>> David, can you verify that it works alright on ppc? >> >> Unfortunately, no. This patch broke bootstrap again with the same >> failure as earlier. >> >> - David >> > > Janne, can you access gcc112 on the compile farm? If not, I can, maybe I can > test the patch. > > Jerry No, I don't have such access. I'd appreciate it very much if you'd have a go at it. Attached is the r28 patch combined with the patch to fix the sanitizer bugs found by Dominique. -- Janne Blomqvist 0001-PR-78534-Change-character-length-from-int-to-size_t.patch.gz Description: GNU Zip compressed data
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sun, Jan 8, 2017 at 4:29 PM, Dominique d'Humières wrote: >> r244027 reverts r244011. Sorry for the breakage. It seems to affect >> all i686 as well in addition to power, maybe all 32-bit hosts. > > For the record, I see the following failures with an instrumented r244026 (as > in pr78672) > > FAIL: gfortran.dg/char_length_20.f90 -O* execution test > FAIL: gfortran.dg/char_length_21.f90 -O* execution test > FAIL: gfortran.dg/repeat_2.f90 -O1 execution test > … > FAIL: gfortran.dg/repeat_2.f90 -Os execution test > FAIL: gfortran.dg/widechar_6.f90 -O1 execution test > … > FAIL: gfortran.dg/widechar_6.f90 -Os execution test > FAIL: gfortran.dg/widechar_intrinsics_6.f90 -O* execution test > > The run time failures are all of the kind Sorry, I missed that I had to use an instrumented build to catch these, and assumed everything was Ok once I managed to regtest cleanly on i686. The following patch fixes these issues for me, does it work for you? diff --git a/libgfortran/intrinsics/string_intrinsics_inc.c b/libgfortran/intrinsics/string_intrinsics_inc.c index 0da5130..74a994b 100644 --- a/libgfortran/intrinsics/string_intrinsics_inc.c +++ b/libgfortran/intrinsics/string_intrinsics_inc.c @@ -177,23 +177,25 @@ string_trim (gfc_charlen_type *len, CHARTYPE **dest, gfc_charlen_type slen, gfc_charlen_type string_len_trim (gfc_charlen_type len, const CHARTYPE *s) { - const gfc_charlen_type long_len = (gfc_charlen_type) sizeof (unsigned long); - gfc_charlen_type i; + if (len <= 0) +return 0; + + const size_t long_len = sizeof (unsigned long); - i = len - 1; + size_t i = len - 1; /* If we've got the standard (KIND=1) character type, we scan the string in long word chunks to speed it up (until a long word is hit that does not consist of ' 's). */ if (sizeof (CHARTYPE) == 1 && i >= long_len) { - int starting; + size_t starting; unsigned long blank_longword; /* Handle the first characters until we're aligned on a long word boundary. Actually, s + i + 1 must be properly aligned, because s + i will be the last byte of a long word read. */ - starting = ((unsigned long) + starting = ( #ifdef __INTPTR_TYPE__ (__INTPTR_TYPE__) #endif -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On 01/13/2017 11:46 AM, David Edelsohn wrote: > On Fri, Jan 13, 2017 at 12:09 PM, Janne Blomqvist > wrote: >> On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: I was finally able to get a 32-bit i686 compiler going (my attempts to do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to running 32-bit builds/tests on a i686 container). At least on i686, the patch below on top of the big charlen->size_t patch fixes the failures >>> >>> Patch approved. The old code used gfc_extract_int, which bails out if a >>> non-constant expression is passed, so this is the right thing to do. >>> >>> FX >> >> Committed as r28. >> >> David, can you verify that it works alright on ppc? > > Unfortunately, no. This patch broke bootstrap again with the same > failure as earlier. > > - David > Janne, can you access gcc112 on the compile farm? If not, I can, maybe I can test the patch. Jerry
Re: [PATCH] PR 78534 Change character length from int to size_t
On Fri, Jan 13, 2017 at 12:09 PM, Janne Blomqvist wrote: > On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: >>> I was finally able to get a 32-bit i686 compiler going (my attempts to >>> do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to >>> running 32-bit builds/tests on a i686 container). At least on i686, >>> the patch below on top of the big charlen->size_t patch fixes the >>> failures >> >> Patch approved. The old code used gfc_extract_int, which bails out if a >> non-constant expression is passed, so this is the right thing to do. >> >> FX > > Committed as r28. > > David, can you verify that it works alright on ppc? Unfortunately, no. This patch broke bootstrap again with the same failure as earlier. - David
Re: [PATCH] PR 78534 Change character length from int to size_t
On Thu, Jan 12, 2017 at 10:46 AM, FX wrote: >> I was finally able to get a 32-bit i686 compiler going (my attempts to >> do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to >> running 32-bit builds/tests on a i686 container). At least on i686, >> the patch below on top of the big charlen->size_t patch fixes the >> failures > > Patch approved. The old code used gfc_extract_int, which bails out if a > non-constant expression is passed, so this is the right thing to do. > > FX Committed as r28. David, can you verify that it works alright on ppc? -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
> I was finally able to get a 32-bit i686 compiler going (my attempts to > do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to > running 32-bit builds/tests on a i686 container). At least on i686, > the patch below on top of the big charlen->size_t patch fixes the > failures Patch approved. The old code used gfc_extract_int, which bails out if a non-constant expression is passed, so this is the right thing to do. FX
Re: [PATCH] PR 78534 Change character length from int to size_t
On Sun, Jan 8, 2017 at 4:29 PM, Dominique d'Humières wrote: >> r244027 reverts r244011. Sorry for the breakage. It seems to affect >> all i686 as well in addition to power, maybe all 32-bit hosts. > > For the record, I see the following failures with an instrumented r244026 (as > in pr78672) [snip] I was finally able to get a 32-bit i686 compiler going (my attempts to do this on a x86_64-pc-linux-gnu host failed, in the end I resorted to running 32-bit builds/tests on a i686 container). At least on i686, the patch below on top of the big charlen->size_t patch fixes the failures: diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index be63038..82319ed 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -4726,7 +4726,7 @@ gfc_resolve_substring_charlen (gfc_expr *e) /* Length = (end - start + 1). */ e->ts.u.cl->length = gfc_subtract (end, start); e->ts.u.cl->length = gfc_add (e->ts.u.cl->length, - gfc_get_int_expr (gfc_default_integer_kind, + gfc_get_int_expr (gfc_charlen_int_kind, NULL, 1)); /* F2008, 6.4.1: Both the starting point and the ending point shall @@ -11420,9 +11420,10 @@ resolve_charlen (gfc_charlen *cl) /* F2008, 4.4.3.2: If the character length parameter value evaluates to a negative value, the length of character entities declared is zero. */ - if (cl->length && mpz_sgn (cl->length->value.integer) < 0) + if (cl->length && cl->length->expr_type == EXPR_CONSTANT + && mpz_sgn (cl->length->value.integer) < 0) gfc_replace_expr (cl->length, - gfc_get_int_expr (gfc_default_integer_kind, NULL, 0)); + gfc_get_int_expr (gfc_charlen_int_kind, NULL, 0)); /* Check that the character length is not too large. */ k = gfc_validate_kind (BT_INTEGER, gfc_charlen_int_kind, false); So what happened was that without the EXPR_CONSTANT check, I was accessing uninitialized memory (for some reason probably due to memory layout or such, this didn't cause failures on x86_64-pc-linux-gnu). Also, I found a couple of additional places where gfc_charlen_int_kind should be used instead of gfc_default_integer_kind which is included in the patch above, although AFAICT they have nothing to do with the testcase failures. Unless there are objections, I'll commit the fixed patch in a few days. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
> r244027 reverts r244011. Sorry for the breakage. It seems to affect > all i686 as well in addition to power, maybe all 32-bit hosts. For the record, I see the following failures with an instrumented r244026 (as in pr78672) FAIL: gfortran.dg/char_length_20.f90 -O* execution test FAIL: gfortran.dg/char_length_21.f90 -O* execution test FAIL: gfortran.dg/repeat_2.f90 -O1 execution test … FAIL: gfortran.dg/repeat_2.f90 -Os execution test FAIL: gfortran.dg/widechar_6.f90 -O1 execution test … FAIL: gfortran.dg/widechar_6.f90 -Os execution test FAIL: gfortran.dg/widechar_intrinsics_6.f90 -O* execution test The run time failures are all of the kind = ==43614==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200168 at pc 0x00010abfb578 bp 0x7fff55735250 sp 0x7fff55735248 READ of size 8 at 0x60200168 thread T0 #0 0x10abfb577 in _gfortran_string_len_trim (/opt/gcc/gcc7gp/lib/libgfortran.4.dylib+0x728577) #1 0x10a4cae2c in MAIN__ (/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out+0x10e2c) #2 0x10a4caea6 in main (/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out+0x10ea6) #3 0x7fffbd674254 in start (/usr/lib/system/libdyld.dylib+0x5254) 0x60200168 is located 8 bytes to the left of 1-byte region [0x60200170,0x60200171) allocated by thread T0 here: #0 0x10c043319 in wrap_malloc (/opt/gcc/gcc7a/lib/libasan.4.dylib+0x61319) #1 0x10a4cad11 in MAIN__ (/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out+0x10d11) #2 0x10a4caea6 in main (/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out+0x10ea6) #3 0x7fffbd674254 in start (/usr/lib/system/libdyld.dylib+0x5254) SUMMARY: AddressSanitizer: heap-buffer-overflow (/opt/gcc/gcc7gp/lib/libgfortran.4.dylib+0x728577) in _gfortran_string_len_trim Shadow bytes around the buggy address: 0x1c03ffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c03ffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c03fff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c04: fa fa fd fd fa fa fd fd fa fa 00 07 fa fa 00 06 0x1c040010: fa fa 03 fa fa fa 00 00 fa fa 00 06 fa fa 06 fa =>0x1c040020: fa fa 07 fa fa fa 07 fa fa fa fd fa fa[fa]01 fa 0x1c040030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c040040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c040050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c040060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c040070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user:f7 Container overflow: fc Array cookie:ac Intra object redzone:bb ASan internal: fe Left alloca redzone: ca Right alloca redzone:cb ==43614==ABORTING Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x10a4d8558 #1 0x10a4d65f5 #2 0x7fffbd881bb9 Abort Dominique
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Jan 3, 2017 at 9:21 PM, FX wrote: >> r244027 reverts r244011. Sorry for the breakage. It seems to affect >> all i686 as well in addition to power, maybe all 32-bit hosts. > > The breakage is surprising, as the rejects-valid does not involve character > length at all. > Jane, any chance you might have accidentally committed some unrelated change > along? Yes, I'm surprised, and I don't understand it yet. I'm planning to fire up a 32-bit VM and build there in order see what the issue is. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
> r244027 reverts r244011. Sorry for the breakage. It seems to affect > all i686 as well in addition to power, maybe all 32-bit hosts. The breakage is surprising, as the rejects-valid does not involve character length at all. Jane, any chance you might have accidentally committed some unrelated change along? FX
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Jan 3, 2017 at 4:07 PM, David Edelsohn wrote: > This patch broke bootstrap. I now am seeing numerous errors when > building libgomp. > > Please fix or revert immediately. r244027 reverts r244011. Sorry for the breakage. It seems to affect all i686 as well in addition to power, maybe all 32-bit hosts. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
This patch broke bootstrap. I now am seeing numerous errors when building libgomp. Please fix or revert immediately. Thanks, David omp_lib.f90:184:40: logical (4) :: omp_test_lock 1 Error: Symbol 'omp_test_lock' at (1) has already been host associated omp_lib.f90:216:45: integer (4) :: omp_test_nest_lock 1 Error: Symbol 'omp_test_nest_lock' at (1) has already been host associated omp_lib.f90:329:61: integer (omp_proc_bind_kind) :: omp_get_proc_bind 1 Error: Symbol 'omp_get_proc_bind' at (1) has already been host associated and /nasfarm/edelsohn/src/src/libgomp/openacc.f90:466:6: use openacc_internal 1 Error: 'acc_device_nvidia' of module 'openacc_internal', imported at (1), is als o the name of the current program unit /nasfarm/edelsohn/src/src/libgomp/openacc.f90:466:6: use openacc_internal 1 Error: Alternate return specifier in function 'acc_async_test_h' at (1) is not a llowed /nasfarm/edelsohn/src/src/libgomp/openacc.f90:466:6: use openacc_internal 1 Error: Alternate return specifier in function 'acc_async_test_
[PATCH] PR 78534 Change character length from int to size_t
In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. I also changed the _size member in vtables from int to size_t, as there were some cases where character lengths and sizes were apparently mixed up and caused regressions otherwise. Although I haven't tested, this might enable very large derived types as well. Also, as there are some places in the frontend were negative character lengths are used as special flag values, in the frontend the character length is handled as a signed variable of the same size as a size_t, although in the runtime library it really is size_t. I haven't changed the character length variables for the co-array intrinsics, as this is something that may need to be synchronized with OpenCoarrays. Another change in this place is that the algorithm for gfc_trans_string_copy has been rewritten to avoid a -Wstringop-overflow warning. This is v3 of the patch. All the issues pointed out by FX's review of v2 have been fixed. In particular, there are now new functions gfc_mpz_get_hwi and gfc_mpz_set_hwi, similar to the GMP functions mpz_get_si and mpz_set_si, except that they get/set a HOST_WIDE_INT instead of a long value. Similarly, gfc_get_int_expr now takes a HOST_WIDE_INT instead of a long, gfc_extract_long is replaced by gfc_extract_hwi. Also, the preliminary work to handle gfc_charlen_type_node being unsigned has been removed. Regtested on x86_64-pc-linux-gnu, Ok for trunk? frontend: 2017-01-01 Janne Blomqvist PR fortran/78534 PR fortran/66310 * class.c (gfc_find_derived_vtab): Use gfc_size_kind instead of hardcoded kind. (find_intrinsic_vtab): Likewise. * expr.c (gfc_get_character_expr): Length parameter of type gfc_charlen_t. (gfc_get_int_expr): Value argument of type HOST_WIDE_INT. (gfc_extract_hwi): New function. (simplify_const_ref): Make string_len of type gfc_charlen_t. (gfc_simplify_expr): Use HOST_WIDE_INT for substring refs. * gfortran.h (gfc_mpz_get_hwi): New prototype. (gfc_mpz_set_hwi): Likewise. (gfc_charlen_t): New typedef. (gfc_expr): Use gfc_charlen_t for character lengths. (gfc_size_kind): New extern variable. (gfc_extract_hwi): New prototype. (gfc_get_character_expr): Use gfc_charlen_t for character length. (gfc_get_int_expr): Use HOST_WIDE_INT type for value argument. * iresolve.c (gfc_resolve_repeat): Pass string length directly without temporary, use gfc_charlen_int_kind. * match.c (select_intrinsic_set_tmp): Use HOST_WIDE_INT for charlen. * misc.c (gfc_mpz_get_hwi): New function. (gfc_mpz_set_hwi): New function. * module.c (atom_int): Change type from int to HOST_WIDE_INT. (parse_integer): Don't complain about large integers. (write_atom): Use HOST_WIDE_INT for integers. (mio_integer): Handle integer type mismatch. (mio_hwi): New function. (mio_intrinsic_op): Use HOST_WIDE_INT. (mio_array_ref): Likewise. (mio_expr): Likewise. * resolve.c (resolve_select_type): Use HOST_WIDE_INT for charlen, use snprintf. (resolve_charlen): Use mpz_sgn to determine sign. * simplify.c (gfc_simplify_repeat): Use HOST_WIDE_INT/gfc_charlen_t instead of long. * target-memory.c (size_character): Length argument of type gfc_charlen_t. (gfc_encode_character): Likewise. (gfc_interpret_character): Use gfc_charlen_t. * target-memory.h (gfc_encode_character): Modify prototype. * trans-array.c (get_array_ctor_var_strlen): Use gfc_conv_mpz_to_tree_type. * trans-const.c (gfc_conv_mpz_to_tree_type): New function. * trans-const.h (gfc_conv_mpz_to_tree_type): New prototype. * trans-expr.c (gfc_class_len_or_zero_get): Build const of type gfc_charlen_type_node. (gfc_conv_intrinsic_to_class): Use gfc_charlen_int_kind instead of 4, fold_convert to correct type. (gfc_conv_class_to_class): Build const of type size_type_node for size. (gfc_copy_class_to_class): Likewise. (gfc_conv_string_length): Use same type in expression. (gfc_conv_substring): Likewise, use HOST_WIDE_INT for charlen. (gfc_conv_string_tmp): Make sure len is of the right type. (gfc_conv_concat_op): Use same type in expression. (gfc_conv_procedure_call): Likewise. (gfc_trans_string_copy): Rewrite to avoid -Wstringop-overflow warning in generated code. (alloc_scalar_allocatable_for_subcomponent_assignment): fold_convert to right type. (gfc_trans_subcomponent_assign): Likewise. (trans_class_vptr_len_assignment): Build const of correct type.
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, 20 Dec 2016, FX wrote: > Finally, if we’re making this change, we welcome any feedback on how > to make it as easy as possible to handle in user code. Documentation, > preprocessor macros, etc. I believe including this in the (yet to be created) gcc-7/porting_to.html, would be great. Historically the porting_to.html documents have mostly covered C and C++, since that is the source language of the majority of packages in a GNU/Linux distribution that GCC touches. Adding more focus on Fortran users as well feels like a good idea, though. (If you want to go ahead, but prefer the page to be created first, let me know, and I'll take care.) Gerald
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, hi FX, On Tue, 27 Dec 2016 12:56:19 +0200 Janne Blomqvist wrote: > >> I also changed the _size member in vtables from int to size_t, as > >> there were some cases where character lengths and sizes were > >> apparently mixed up and caused regressions otherwise. Although I I can confirm this. Being responsible for adding the _len component for char arrays in unlimited polymorphic objects. This is the only use case where the _len component is used to my knowledge. The separation should have been: - _size: store the size in bytes of a single character - _len: the number of characters stored in the char array in the unlimited polymorphic object. Unfortunately there were some case, which Janne also experienced, where these go stray. I at least succeeded to remove the length from the vtab's-name that is generated for storing in the unlimited polymorphic object. Over time I hope to get the separation of concerns correctly modeled as told above, but for the time being we have to stick with _size have the array size sometimes. I think that is the case when a fixed length char array is stored in the unlimited polymorphic object. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
> The _size member is the size of the object; for polymorphic objects > the size isn't known at compile time, so it is to be stored in the > vtable (in principle, since the vtable tells the exact class of a > polymorphic object, it should be possible to figure out the size > without an explicit _size member, but I'll let the OOP experts handle > that, if they want). There is another vtable member _len which is used > for character lengths, and that is indeed of type > gfc_charlen_type_node. Understood, thanks. > I think when the dust settles, I might make sense to get rid of > gfc_charlen_type_node/gfc_charlen_type and just use > size_type_node/size_t also for string lengths, but at least for now I > think it's clearer to use separate names. I agree with keeping them for now. > So strictly speaking it's not necessary, as long as > gfc_charlen_type_node is a signed integer. > > OTOH, if/when one wants to make gfc_charlen_type_node unsigned, > perhaps some more far-reaching changes are needed and that work is not > necessary anymore. Say, introducing BT_UNSIGNED_INTEGER (??) and > teaching gfc_typespec to handle unsigned integers? Or maybe it's > better to put a tree node specifying the type in the typespec and use > that instead of the bt type + kind to tell what type it is? > > Do you want me to remove that for the time being? Let’s remove that, at least for now. >> There are other cases (resolve.c, simplify.c) where you introduce a >> dependency on middle-end entities (tree.h, trans-types.h) in what are pure >> Fortran front-end stages. This breaks the separation that currently exists, >> and which I strongly think we should keep. > > These changes are similar to the above, i.e. a check that uses > get_type_static_bounds() and works also if gfc_charlen_type_node is > changed to be an unsigned type. OK then let’s remove them too. > - Should I remove the so-far preliminary work to handle > gfc_charlen_type_node being unsigned? I think it makes more sense. > - Should I fix the uses of mpz_{get,set}_{s,u}i? I think so, otherwise there is little reason to break the ABI and not support long strings :) >> - trans-types.h: why do we now need to include trans.h? > > IIRC this was due to some of the new .c files including trans-types.h > but not trans.h and failing to compile. AFAIU the convention is that > headers should include whatever is necessary to use said header. So > this is some latent bug that has been exposed by my other changes. If, with the final version of the patch, you can remove it, please do. And if you don’t, please remove the trans.h includes from source files that already include trans-types.h FX
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Dec 27, 2016 at 12:47 PM, Andre Vehreschild wrote: > Hi Janne, > > sorry for being late in voicing my opinion, but I personally would prefer to > have this patch in a separately. That way bisecting for performance > regressions points only to the offending code and not to the change of the > character length (the latter might add a tiny bit of cost, too). No worries. I included this in the updated charlen-size_t patch I posted and which FX reviewed, but I can certainly commit this part separately. > > Regards, > Andre > > > On Fri, 23 Dec 2016 11:27:10 +0200 > Janne Blomqvist wrote: > >> On Wed, Dec 21, 2016 at 3:14 PM, Andre Vehreschild wrote: >> >> Now when I think about this some more, I have a vague recollection >> >> that a long time ago it used to be something like that. The problem >> >> is that MIN_EXPR will of course be >> >> NON-CONSTANT, so the memcpy call can't be inlined. Hence it was >> >> changed to two separate __builtin_memmove() calls to have better >> >> opportunity to inline. So probably a no-go to change it back. :( >> > >> > I don't get that. From the former only one of the memmove's could have been >> > inlined assuming that only CONSTANT sizes are inlined. >> >> Yes. Which is better than not being able to inline, assuming the >> inlined branch is more likely to be taken, no? >> >> > The other one had a >> > NON-CONSTANT as long as pointer following and constant propagation was not >> > effective together. In our case the "NON-CONSTANT branch" would have been >> > used, which is resolved by constant propagation (of the size of constant >> > memory p points to). I assume the warning is triggered, because dead-code >> > elimination has not removed the else part. >> >> Probably yes, in this case. My performance worries were more about the >> general case. But perhaps they are unfounded, IDK really. Does anybody >> have a battery of string operation benchmarks that mirrors how real >> Fortran code does string handling? >> >> Anyway, the attached patch accomplishes that. It turns the tree dump I >> showed two days ago into something like >> >> { >> integer(kind=8) D.3484; >> unsigned long D.3485; >> >> D.3484 = *_p; >> D.3485 = (unsigned long) D.3484 + 18446744073709551608; >> if (D.3484 != 0) >> { >> __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 >> sz: 1}, MIN_EXPR <(unsigned long) D.3484, 8>); >> if ((unsigned long) D.3484 > 8) >> { >> __builtin_memset ((void *) *p + 8, 32, D.3485); >> } >> } >> } >> >> and gets rid of the -Wstringop-overflow warning. (It causes a two new >> testsuite failures (gfortran.dg/dependency_49.f90 and >> gfortran.dg/transfer_intrinsic_1.f90), but those are just tests that >> grep for patterns in the .original dump and need to be adjusted). >> >> If that is Ok, do you prefer it as part of the charlen-size_t patch, >> or would you (GFortran maintainers in general) prefer that it's a >> separate patch? >> > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 26, 2016 at 12:32 PM, FX wrote: > Hi Janne, > > Thanks for the patch, it is hard and tedious work. Here is the formal review. > I don’t want to be a pain, but I have several questions about the patch, and > given its size and the importance I think we should be double-sure :) Thanks for the in-depth review, I appreciate it! >> I also changed the _size member in vtables from int to size_t, as >> there were some cases where character lengths and sizes were >> apparently mixed up and caused regressions otherwise. Although I >> haven't tested, this might enable very large derived types as well. > > Regarding that one, why are you making it an explicit size_t and not a > charlen type? I know the two will be the same, at least for now, but given > that it’s explicitly a character length we should use that variable type. > This is a preexisting issue with the front-end and library, where we > generally use a mix of types (because they end up being the same anyway, such > as C int and GFC_INTEGER_4). The _size member is the size of the object; for polymorphic objects the size isn't known at compile time, so it is to be stored in the vtable (in principle, since the vtable tells the exact class of a polymorphic object, it should be possible to figure out the size without an explicit _size member, but I'll let the OOP experts handle that, if they want). There is another vtable member _len which is used for character lengths, and that is indeed of type gfc_charlen_type_node. I think when the dust settles, I might make sense to get rid of gfc_charlen_type_node/gfc_charlen_type and just use size_type_node/size_t also for string lengths, but at least for now I think it's clearer to use separate names. > Regarding the introduction of is_charlen in gfc_typespec, I am unclear as to > why it is needed. It is used exclusively in arith.c, which is not where we > should be checking character lengths I think. It is visible by the fact that > we normally shouldn’t need access to middle-end headers (tree.h and > trans-types.h) at that level. So, can’t we make the check where we currently > do it, i.e. later when we translate the constant string? That sounds more > reasonable that introducing a new special-cased entity. This is, well, a leftover from my attempts to make gfc_charlen_type_node an alias for size_type_node, an unsigned type. Since the gfc_typespec doesn't understand unsigned integers, it had to be extended in some fashion. You can see that the check in arith.c checks that the charlen is in the range [0, TYPE_MAX_VALUE(gfc_charlen_type_node)], which the "normal" check isn't able to do. So strictly speaking it's not necessary, as long as gfc_charlen_type_node is a signed integer. OTOH, if/when one wants to make gfc_charlen_type_node unsigned, perhaps some more far-reaching changes are needed and that work is not necessary anymore. Say, introducing BT_UNSIGNED_INTEGER (??) and teaching gfc_typespec to handle unsigned integers? Or maybe it's better to put a tree node specifying the type in the typespec and use that instead of the bt type + kind to tell what type it is? Do you want me to remove that for the time being? > There are other cases (resolve.c, simplify.c) where you introduce a > dependency on middle-end entities (tree.h, trans-types.h) in what are pure > Fortran front-end stages. This breaks the separation that currently exists, > and which I strongly think we should keep. These changes are similar to the above, i.e. a check that uses get_type_static_bounds() and works also if gfc_charlen_type_node is changed to be an unsigned type. > ** libgfortran ** > > - in io/write.c, the “for” clauses in in namelist_write() have weird spacing > around their semicolons (i.e. space should be after, not before) Ah, I hadn't noticed that. As one can see, it's a pre-existing issue, but I might as well fix it when I'm changing that line. Will do. > - in intrinsics/extends_type_of.c, use gfc_charlen_type instead of size_t > vtype->size As I explained earlier, this is because the size member is the size of the object rather than the charlen, so I think it should stay a size_t. > ** front-end ** > > - class.c: use gfc_charlen_int_kind instead of gfc_size_kind Same here. > - class.c, in find_intrinsic_vtab(): in the call to gfc_get_int_expr, the > third argument cannot be cast from (size_t) to (long), as this would fail on > LLP64 hosts Yes, but... the issue is that gfc_get_int_expr uses mpz_set_si, so can't handle more than long anyway. But hmm, maybe that typecast should be removed anyway, so that when/if gfc_get_int_expr is fixed this would be automatically fixed as well. > - expr.c, regarding gfc_extract_long(): we could definitely extract an host > wide int or an mpz value. This new function is called twice: once in > resolve_charlen() where we could use the GMP function mpz_sgn() to check if > the constant value is negative; the second time in gfc_simplify_repeat, where
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, sorry for being late in voicing my opinion, but I personally would prefer to have this patch in a separately. That way bisecting for performance regressions points only to the offending code and not to the change of the character length (the latter might add a tiny bit of cost, too). Regards, Andre On Fri, 23 Dec 2016 11:27:10 +0200 Janne Blomqvist wrote: > On Wed, Dec 21, 2016 at 3:14 PM, Andre Vehreschild wrote: > >> Now when I think about this some more, I have a vague recollection > >> that a long time ago it used to be something like that. The problem > >> is that MIN_EXPR will of course be > >> NON-CONSTANT, so the memcpy call can't be inlined. Hence it was > >> changed to two separate __builtin_memmove() calls to have better > >> opportunity to inline. So probably a no-go to change it back. :( > > > > I don't get that. From the former only one of the memmove's could have been > > inlined assuming that only CONSTANT sizes are inlined. > > Yes. Which is better than not being able to inline, assuming the > inlined branch is more likely to be taken, no? > > > The other one had a > > NON-CONSTANT as long as pointer following and constant propagation was not > > effective together. In our case the "NON-CONSTANT branch" would have been > > used, which is resolved by constant propagation (of the size of constant > > memory p points to). I assume the warning is triggered, because dead-code > > elimination has not removed the else part. > > Probably yes, in this case. My performance worries were more about the > general case. But perhaps they are unfounded, IDK really. Does anybody > have a battery of string operation benchmarks that mirrors how real > Fortran code does string handling? > > Anyway, the attached patch accomplishes that. It turns the tree dump I > showed two days ago into something like > > { > integer(kind=8) D.3484; > unsigned long D.3485; > > D.3484 = *_p; > D.3485 = (unsigned long) D.3484 + 18446744073709551608; > if (D.3484 != 0) > { > __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 > sz: 1}, MIN_EXPR <(unsigned long) D.3484, 8>); > if ((unsigned long) D.3484 > 8) > { > __builtin_memset ((void *) *p + 8, 32, D.3485); > } > } > } > > and gets rid of the -Wstringop-overflow warning. (It causes a two new > testsuite failures (gfortran.dg/dependency_49.f90 and > gfortran.dg/transfer_intrinsic_1.f90), but those are just tests that > grep for patterns in the .original dump and need to be adjusted). > > If that is Ok, do you prefer it as part of the charlen-size_t patch, > or would you (GFortran maintainers in general) prefer that it's a > separate patch? > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, Thanks for the patch, it is hard and tedious work. Here is the formal review. I don’t want to be a pain, but I have several questions about the patch, and given its size and the importance I think we should be double-sure :) > I also changed the _size member in vtables from int to size_t, as > there were some cases where character lengths and sizes were > apparently mixed up and caused regressions otherwise. Although I > haven't tested, this might enable very large derived types as well. Regarding that one, why are you making it an explicit size_t and not a charlen type? I know the two will be the same, at least for now, but given that it’s explicitly a character length we should use that variable type. This is a preexisting issue with the front-end and library, where we generally use a mix of types (because they end up being the same anyway, such as C int and GFC_INTEGER_4). Regarding the introduction of is_charlen in gfc_typespec, I am unclear as to why it is needed. It is used exclusively in arith.c, which is not where we should be checking character lengths I think. It is visible by the fact that we normally shouldn’t need access to middle-end headers (tree.h and trans-types.h) at that level. So, can’t we make the check where we currently do it, i.e. later when we translate the constant string? That sounds more reasonable that introducing a new special-cased entity. There are other cases (resolve.c, simplify.c) where you introduce a dependency on middle-end entities (tree.h, trans-types.h) in what are pure Fortran front-end stages. This breaks the separation that currently exists, and which I strongly think we should keep. ** libgfortran ** - in io/write.c, the “for” clauses in in namelist_write() have weird spacing around their semicolons (i.e. space should be after, not before) - in intrinsics/extends_type_of.c, use gfc_charlen_type instead of size_t vtype->size ** front-end ** - class.c: use gfc_charlen_int_kind instead of gfc_size_kind - class.c, in find_intrinsic_vtab(): in the call to gfc_get_int_expr, the third argument cannot be cast from (size_t) to (long), as this would fail on LLP64 hosts - expr.c, regarding gfc_extract_long(): we could definitely extract an host wide int or an mpz value. This new function is called twice: once in resolve_charlen() where we could use the GMP function mpz_sgn() to check if the constant value is negative; the second time in gfc_simplify_repeat, where we should simply bail out (return NULL) if the integer is too big to fit into a long (we would bail out a few lines later anyway, see “semi-arbitrary limit”). - iresolve.c, extra space after NULL in call to gfc_get_int_expr() in gfc_resolve_repeat() - match.c, in select_intrinsic_set_tmp(), charlen should be a gfc_charlen_t and mpz_get_si will break for long string sizes - in resolve.c, like in arith.c, we should not use tree.h and trans-types.h. We should do the comparison by looking at integer kinds, not through the charlen_type_node - in resolve.c, in resolve_select_type(), another case of mpz_get_si() that will break for long string sizes - in simplify.c, again, we should not use tree.h and trans-types.h - trans-decl.c seems like unrelated changes - trans-types.h: why do we now need to include trans.h? Thanks again for working on that! FX
[PATCH] PR 78534 Change character length from int to size_t
In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. I also changed the _size member in vtables from int to size_t, as there were some cases where character lengths and sizes were apparently mixed up and caused regressions otherwise. Although I haven't tested, this might enable very large derived types as well. Also, as there are some places in the frontend were negative character lengths are used as special flag values, in the frontend the character length is handled as a signed variable of the same size as a size_t, although in the runtime library it really is size_t. I haven't changed the character length variables for the co-array intrinsics, as this is something that may need to be synchronized with OpenCoarrays. Another change in this place is that the algorithm for gfc_trans_string_copy has been rewritten to avoid a -Wstringop-overflow warning. Regtested on x86_64-pc-linux-gnu, Ok for trunk? frontend: 2016-12-23 Janne Blomqvist PR fortran/78534 PR fortran/66310 * arith.c (gfc_check_charlen_range): New function. (gfc_range_check): Use gfc_check_charlen_range. * class.c (gfc_find_derived_vtab): Use gfc_size_kind instead of hardcoded kind. (find_intrinsic_vtab): Likewise. * expr.c (gfc_get_character_expr): Length parameter of type gfc_charlen_t. (gfc_get_int_expr): Value argument of type long. (gfc_extract_long): New function. * gfortran.h (gfc_typespec): New member is_charlen. (gfc_charlen_t): New typedef. (gfc_expr): Use gfc_charlen_t for character lengths. (gfc_size_kind): New extern variable. (gfc_extract_long): New prototype. (gfc_get_character_expr): Use gfc_charlen_t for character length. (gfc_get_int_expr): Use long type for value argument. * iresolve.c (gfc_resolve_repeat): Use gfc_charlen_t, gfc_charlen_int_kind, set is_charlen. * match.c (select_intrinsic_set_tmp): Use long for charlen. * module.c (atom_int): Change type from int to HOST_WIDE_INT. (parse_integer): Don't complain about large integers. (write_atom): Use HOST_WIDE_INT for integers. (mio_integer): Handle integer type mismatch. (mio_hwi): New function. (mio_intrinsic_op): Use HOST_WIDE_INT. (mio_array_ref): Likewise. (mio_expr): Likewise. * resolve.c (resolve_substring): Use get_type_static_bounds. (resolve_select_type): Use long for charlen. (resolve_charlen): Use long for charlen, get_type_static_bounds. * simplify.c (gfc_simplify_repeat): Likewise. * target-memory.c (gfc_interpret_character): Use gfc_charlen_t. * trans-array.c (get_array_ctor_var_strlen): Use gfc_conv_mpz_to_tree_type. * trans-const.c (gfc_conv_mpz_to_tree_type): New function. * trans-const.h (gfc_conv_mpz_to_tree_type): New prototype. * trans-decl.c (create_function_arglist): Assert that length is not NULL_TREE. * trans-expr.c (gfc_class_len_or_zero_get): Build const of type gfc_charlen_type_node. (gfc_conv_intrinsic_to_class): Use gfc_charlen_int_kind instead of 4, fold_convert to correct type. (gfc_conv_class_to_class): Build const of type size_type_node for size. (gfc_copy_class_to_class): Likewise. (gfc_conv_string_length): Use same type in expression. (gfc_conv_substring): Likewise, use long for charlen. (gfc_conv_string_tmp): Make sure len is of the right type. (gfc_conv_concat_op): Use same type in expression. (gfc_conv_procedure_call): Likewise. (gfc_trans_string_copy): Rewrite to avoid -Wstringop-overflow warning in generated code. (alloc_scalar_allocatable_for_subcomponent_assignment): fold_convert to right type. (gfc_trans_subcomponent_assign): Likewise. (trans_class_vptr_len_assignment): Build const of correct type. (gfc_trans_pointer_assignment): Likewise. (alloc_scalar_allocatable_for_assignment): fold_convert to right type in expr. (trans_class_assignment): Build const of correct type. * trans-intrinsic.c (gfc_conv_associated): Likewise. (gfc_conv_intrinsic_repeat): Do calculation in sizetype. * trans-io.c (gfc_build_io_library_fndecls): Use gfc_charlen_type_node for character lengths. * trans-stmt.c (gfc_trans_label_assign): Build const of gfc_charlen_type_node. (gfc_trans_character_select): Likewise. (gfc_trans_allocate): Likewise, don't typecast strlen result. (gfc_trans_deallocate): Don't typecast strlen result. * trans-types.c (gfc_size_kind): New variable. (gfc_init_types):
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Dec 21, 2016 at 3:14 PM, Andre Vehreschild wrote: >> Now when I think about this some more, I have a vague recollection >> that a long time ago it used to be something like that. The problem >> is that MIN_EXPR will of course be >> NON-CONSTANT, so the memcpy call can't be inlined. Hence it was >> changed to two separate __builtin_memmove() calls to have better >> opportunity to inline. So probably a no-go to change it back. :( > > I don't get that. From the former only one of the memmove's could have been > inlined assuming that only CONSTANT sizes are inlined. Yes. Which is better than not being able to inline, assuming the inlined branch is more likely to be taken, no? > The other one had a > NON-CONSTANT as long as pointer following and constant propagation was not > effective together. In our case the "NON-CONSTANT branch" would have been > used, > which is resolved by constant propagation (of the size of constant memory p > points to). I assume the warning is triggered, because dead-code elimination > has not removed the else part. Probably yes, in this case. My performance worries were more about the general case. But perhaps they are unfounded, IDK really. Does anybody have a battery of string operation benchmarks that mirrors how real Fortran code does string handling? Anyway, the attached patch accomplishes that. It turns the tree dump I showed two days ago into something like { integer(kind=8) D.3484; unsigned long D.3485; D.3484 = *_p; D.3485 = (unsigned long) D.3484 + 18446744073709551608; if (D.3484 != 0) { __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, MIN_EXPR <(unsigned long) D.3484, 8>); if ((unsigned long) D.3484 > 8) { __builtin_memset ((void *) *p + 8, 32, D.3485); } } } and gets rid of the -Wstringop-overflow warning. (It causes a two new testsuite failures (gfortran.dg/dependency_49.f90 and gfortran.dg/transfer_intrinsic_1.f90), but those are just tests that grep for patterns in the .original dump and need to be adjusted). If that is Ok, do you prefer it as part of the charlen-size_t patch, or would you (GFortran maintainers in general) prefer that it's a separate patch? -- Janne Blomqvist diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 3767a28..6708ee03 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6455,33 +6455,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree dlength, tree dest, return; } + /* The string copy algorithm below generates code like + + if (dlen > 0) { + memmove (dest, src, min(dlen, slen)); + if (slen < dlen) + memset(&dest[slen], ' ', dlen - slen); + } + */ + /* Do nothing if the destination length is zero. */ cond = fold_build2_loc (input_location, GT_EXPR, boolean_type_node, dlen, build_int_cst (size_type_node, 0)); - /* The following code was previously in _gfortran_copy_string: - - // The two strings may overlap so we use memmove. - void - copy_string (GFC_INTEGER_4 destlen, char * dest, -GFC_INTEGER_4 srclen, const char * src) - { - if (srclen >= destlen) - { - // This will truncate if too long. - memmove (dest, src, destlen); - } - else - { - memmove (dest, src, srclen); - // Pad with spaces. - memset (&dest[srclen], ' ', destlen - srclen); - } - } - - We're now doing it here for better optimization, but the logic - is the same. */ - /* For non-default character kinds, we have to multiply the string length by the base type size. */ chartype = gfc_get_char_type (dkind); @@ -6504,17 +6490,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree dlength, tree dest, else src = gfc_build_addr_expr (pvoid_type_node, src); - /* Truncate string if source is too long. */ - cond2 = fold_build2_loc (input_location, GE_EXPR, boolean_type_node, slen, - dlen); + /* First do the memmove. */ + tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen, + slen); tmp2 = build_call_expr_loc (input_location, builtin_decl_explicit (BUILT_IN_MEMMOVE), - 3, dest, src, dlen); + 3, dest, src, tmp2); + stmtblock_t tmpblock2; + gfc_init_block (&tmpblock2); + gfc_add_expr_to_block (&tmpblock2, tmp2); - /* Else copy and pad with spaces. */ - tmp3 = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_MEMMOVE), - 3, dest, src, slen); + /* If the destination is longer, fill the end with spaces. */ + cond2 = fold_build2_loc (input_location, LT_EXPR, boolean_type_node, slen, +
Re: [PATCH] PR 78534 Change character length from int to size_t
> Now when I think about this some more, I have a vague recollection > that a long time ago it used to be something like that. The problem > is that MIN_EXPR will of course be > NON-CONSTANT, so the memcpy call can't be inlined. Hence it was > changed to two separate __builtin_memmove() calls to have better > opportunity to inline. So probably a no-go to change it back. :( I don't get that. From the former only one of the memmove's could have been inlined assuming that only CONSTANT sizes are inlined. The other one had a NON-CONSTANT as long as pointer following and constant propagation was not effective together. In our case the "NON-CONSTANT branch" would have been used, which is resolved by constant propagation (of the size of constant memory p points to). I assume the warning is triggered, because dead-code elimination has not removed the else part. Following this thought the MIN_EXPR would be propagated to 5 and the inliner can do its magic. Albeit it may be that now some other optimization level will trigger a warning, because some part has not been removed/constant replaced. What do you think of that? -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Dec 21, 2016 at 1:50 PM, Andre Vehreschild wrote: >> Here p is the character variable, and _p is the charlen. My guess is >> that the problem is that with -O1 it sees that the second memmove >> would overflow p, but it doesn't realize that branch is never taken. >> Cranking up the optimization level to -O2 and beyond makes it realize >> it, and thus the warning disappears. >> >> Perhaps one could rewrite that to something like >> >> __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, >> MIN_EXPR<(unsigned long) D.3598,8>); >> if ((unsigned long) D.3598 > 8) >> { >> __builtin_memset ((void*) *p + 8, 32, D.3599); >> } > > That looks interesting. It assumes though, that D.3598 will *never* be > negative. Because when it is negative 8 characters (cast to unsigned makes the > negative number huge) will be copied, while in the former code memmove will > reject the coping of a negative number of bytes. Therefore I propose to omit > the cast in the MIN_EXPR and make the constant 8 signed, too. That should > comply and mimick the former behavior more closely. What do you think? Who's > going to try? Now when I think about this some more, I have a vague recollection that a long time ago it used to be something like that. The problem is that MIN_EXPR will of course be NON-CONSTANT, so the memcpy call can't be inlined. Hence it was changed to two separate __builtin_memmove() calls to have better opportunity to inline. So probably a no-go to change it back. :( -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
> Here p is the character variable, and _p is the charlen. My guess is > that the problem is that with -O1 it sees that the second memmove > would overflow p, but it doesn't realize that branch is never taken. > Cranking up the optimization level to -O2 and beyond makes it realize > it, and thus the warning disappears. > > Perhaps one could rewrite that to something like > > __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, > MIN_EXPR<(unsigned long) D.3598,8>); > if ((unsigned long) D.3598 > 8) > { > __builtin_memset ((void*) *p + 8, 32, D.3599); > } That looks interesting. It assumes though, that D.3598 will *never* be negative. Because when it is negative 8 characters (cast to unsigned makes the negative number huge) will be copied, while in the former code memmove will reject the coping of a negative number of bytes. Therefore I propose to omit the cast in the MIN_EXPR and make the constant 8 signed, too. That should comply and mimick the former behavior more closely. What do you think? Who's going to try? -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Dec 21, 2016 at 12:33 PM, Andre Vehreschild wrote: > Hi Janne, > >> But yes, I'm still seeing the warning messages with -O1 for >> allocate_deferred_char_scalar_1.f03. AFAICT it's a bogus warning, but >> I don't know how to get rid of it.. > > No, that warning isn't at all bogus. The warning in fact is astonishingly > precise. When I remember correctly, then the warning complains about trying to > put a string of length 8 into memory of length 5. There is never a memory > access error at runtime, because the code generated ensures that only 5 chars > are copied, but I am impressed by the analysis done by some intermediate step > of gcc. It figures, that memory is available for 5 characters only derefing a > "static/constant" pointer and then learning that initially 8 chars are to be > copied. I already tried to fix this by only generating code to copy the 5 > characters and make this knowledge available to the gimplifier, but I failed > to > deref the pointer and get the information statically. So IMHO the warning is > not bogus. It is absolutely correct and it is quite sophisticated to learn all > the necessary facts, but I didn't find a way to get this done in the > front-end. > We might be able to prevent the warning when there is a chance to add some > hook > into the middle stages of the compiler, telling it, that everything is fine. > But I have no idea what is possible and available there. I suspect it's complaining about (from the -fdump-tree-original): { integer(kind=8) D.3598; unsigned long D.3599; D.3598 = *_p; D.3599 = (unsigned long) D.3598 + 18446744073709551608; if (D.3598 != 0) { if ((unsigned long) D.3598 <= 8) { __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, (unsigned long) D.3598); } else { __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, 8); __builtin_memset ((void *) *p + 8, 32, D.3599); } } } Here p is the character variable, and _p is the charlen. My guess is that the problem is that with -O1 it sees that the second memmove would overflow p, but it doesn't realize that branch is never taken. Cranking up the optimization level to -O2 and beyond makes it realize it, and thus the warning disappears. Perhaps one could rewrite that to something like __builtin_memmove ((void *) *p, (void *) &"12345679"[1]{lb: 1 sz: 1}, MIN_EXPR<(unsigned long) D.3598,8>); if ((unsigned long) D.3598 > 8) { __builtin_memset ((void*) *p + 8, 32, D.3599); } ? -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, > But yes, I'm still seeing the warning messages with -O1 for > allocate_deferred_char_scalar_1.f03. AFAICT it's a bogus warning, but > I don't know how to get rid of it.. No, that warning isn't at all bogus. The warning in fact is astonishingly precise. When I remember correctly, then the warning complains about trying to put a string of length 8 into memory of length 5. There is never a memory access error at runtime, because the code generated ensures that only 5 chars are copied, but I am impressed by the analysis done by some intermediate step of gcc. It figures, that memory is available for 5 characters only derefing a "static/constant" pointer and then learning that initially 8 chars are to be copied. I already tried to fix this by only generating code to copy the 5 characters and make this knowledge available to the gimplifier, but I failed to deref the pointer and get the information statically. So IMHO the warning is not bogus. It is absolutely correct and it is quite sophisticated to learn all the necessary facts, but I didn't find a way to get this done in the front-end. We might be able to prevent the warning when there is a chance to add some hook into the middle stages of the compiler, telling it, that everything is fine. But I have no idea what is possible and available there. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Wed, Dec 21, 2016 at 12:05 PM, Andre Vehreschild wrote: > Hi all, > > so I have learned that proposing to write "speaking code" is not very well > taken. If you want to make a patch introducing gfc_size_t_zero_node, go ahead, at least I won't object. I don't think build_zero_cst(size_type_node) is that terrible myself, but I don't have any hard opinions on this. > Anyway, there is a patch (in two parts) hanging about changing the character > length from int to size_t. It looks ok to me, but I do not have the privilege > to ok it. Furthermore am I still not convinced that we can't do anything about > the failing testcase allocate_deferred_char_scalar_1. So how do we proceed? I have just verified that my fix for PR 78867 fixes the -flto failures Dominique noticed. I have some other minor cleanup to do to the charlen->size_t patch, and then I'll resubmit it. But yes, I'm still seeing the warning messages with -O1 for allocate_deferred_char_scalar_1.f03. AFAICT it's a bogus warning, but I don't know how to get rid of it.. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi all, so I have learned that proposing to write "speaking code" is not very well taken. Anyway, there is a patch (in two parts) hanging about changing the character length from int to size_t. It looks ok to me, but I do not have the privilege to ok it. Furthermore am I still not convinced that we can't do anything about the failing testcase allocate_deferred_char_scalar_1. So how do we proceed? - Andre On Tue, 20 Dec 2016 17:08:54 +0100 Jakub Jelinek wrote: > On Tue, Dec 20, 2016 at 05:04:54PM +0100, Andre Vehreschild wrote: > > Well, then how about: > > > > #define gfc_size_t_zero_node build_int_cst (size_type_node, 0) > > > > We can't get any faster and for new and old gfortran-hackers one > > identifier's meaning is faster to grasp than two's. > > Such macros can cause various maintenance issues, so I'm not in favor of > that. But if you as fortran maintainers want it, I won't object strongly. > > Jakub -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Dec 20, 2016 at 05:04:54PM +0100, Andre Vehreschild wrote: > Well, then how about: > > #define gfc_size_t_zero_node build_int_cst (size_type_node, 0) > > We can't get any faster and for new and old gfortran-hackers one identifier's > meaning is faster to grasp than two's. Such macros can cause various maintenance issues, so I'm not in favor of that. But if you as fortran maintainers want it, I won't object strongly. Jakub
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, 20 Dec 2016 16:40:13 +0100 Jakub Jelinek wrote: > On Tue, Dec 20, 2016 at 04:29:07PM +0100, Andre Vehreschild wrote: > > > The first one is GCC internal type for representing sizes, the latter is > > > the C size_t (usually they have the same precision, they always have the > > > same signedness (unsigned)). > > > In the past sizetype actually has been a signed type with very special > > > behavior. > > > > I am still wondering if it does not make sense to have something like > > gfc_size_t_zero_node to prevent us from repeating build_zero_cst > > (size_type_node) all the time. I had to use it 16 times, i.e., 16 times the > > code for building a zero size type node is generated instead of a reference > > to a "constant". And I don't want to know how often size_zero_node is used > > in the wrong location. > > built_int_cst (size_type_node, 0) is actually faster than build_zero_cst, > one fewer level of indirection. > The 0 constant is cached in the type itself, so it actually in the end > is basically just: > return TREE_VEC_ELT (TYPE_CACHED_VALUES (type), 1); > > Adding a variable to hold gfc_size_t_zero_node would mean you'd need to add > a GC root to hold it. > > Note, sizetype should be still what is usually used, only if you in the ABI > have something declared as C size_t, then size_type_node should be used. Well, then how about: #define gfc_size_t_zero_node build_int_cst (size_type_node, 0) We can't get any faster and for new and old gfortran-hackers one identifier's meaning is faster to grasp than two's. - Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Dec 20, 2016 at 04:29:07PM +0100, Andre Vehreschild wrote: > > The first one is GCC internal type for representing sizes, the latter is > > the C size_t (usually they have the same precision, they always have the > > same signedness (unsigned)). > > In the past sizetype actually has been a signed type with very special > > behavior. > > I am still wondering if it does not make sense to have something like > gfc_size_t_zero_node to prevent us from repeating build_zero_cst > (size_type_node) all the time. I had to use it 16 times, i.e., 16 times the > code for building a zero size type node is generated instead of a reference to > a "constant". And I don't want to know how often size_zero_node is used in the > wrong location. built_int_cst (size_type_node, 0) is actually faster than build_zero_cst, one fewer level of indirection. The 0 constant is cached in the type itself, so it actually in the end is basically just: return TREE_VEC_ELT (TYPE_CACHED_VALUES (type), 1); Adding a variable to hold gfc_size_t_zero_node would mean you'd need to add a GC root to hold it. Note, sizetype should be still what is usually used, only if you in the ABI have something declared as C size_t, then size_type_node should be used. Jakub
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, 20 Dec 2016 16:00:19 +0100 Jakub Jelinek wrote: > On Tue, Dec 20, 2016 at 04:55:29PM +0200, Janne Blomqvist wrote: > > On Tue, Dec 20, 2016 at 3:42 PM, Andre Vehreschild wrote: > > > Hi all, > > > > > >> I think you should use build_zero_cst(size_type_node) instead of > > >> size_zero_node as size_zero_node is of type sizetype which is not the > > >> same as size_type_node. Otherwise looks good. > > > > > > In the software design classes I took this was called a design error: Not > > > choosing sufficiently different names for different artifacts. It was > > > considered a beginner's error. > > > > Yeah, sizetype vs. size_type_node is confusing, to say the least.. > > The first one is GCC internal type for representing sizes, the latter is > the C size_t (usually they have the same precision, they always have the > same signedness (unsigned)). > In the past sizetype actually has been a signed type with very special > behavior. I am still wondering if it does not make sense to have something like gfc_size_t_zero_node to prevent us from repeating build_zero_cst (size_type_node) all the time. I had to use it 16 times, i.e., 16 times the code for building a zero size type node is generated instead of a reference to a "constant". And I don't want to know how often size_zero_node is used in the wrong location. - Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Dec 20, 2016 at 04:55:29PM +0200, Janne Blomqvist wrote: > On Tue, Dec 20, 2016 at 3:42 PM, Andre Vehreschild wrote: > > Hi all, > > > >> I think you should use build_zero_cst(size_type_node) instead of > >> size_zero_node as size_zero_node is of type sizetype which is not the > >> same as size_type_node. Otherwise looks good. > > > > In the software design classes I took this was called a design error: Not > > choosing sufficiently different names for different artifacts. It was > > considered a beginner's error. > > Yeah, sizetype vs. size_type_node is confusing, to say the least.. The first one is GCC internal type for representing sizes, the latter is the C size_t (usually they have the same precision, they always have the same signedness (unsigned)). In the past sizetype actually has been a signed type with very special behavior. Jakub
Re: [PATCH] PR 78534 Change character length from int to size_t
On Tue, Dec 20, 2016 at 3:42 PM, Andre Vehreschild wrote: > Hi all, > >> I think you should use build_zero_cst(size_type_node) instead of >> size_zero_node as size_zero_node is of type sizetype which is not the >> same as size_type_node. Otherwise looks good. > > In the software design classes I took this was called a design error: Not > choosing sufficiently different names for different artifacts. It was > considered a beginner's error. Yeah, sizetype vs. size_type_node is confusing, to say the least.. > So now I have to repeat myself 16 times only to work around this b***. Nothing > that will improve gfortran's maintainability. > > Second version of the changes needed for caf attached. Bootstrapped and > regtested fine besides prior known > > FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for excess > errors) > > on x86_64-linux/f23. Ok, looks good. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi all, > I think you should use build_zero_cst(size_type_node) instead of > size_zero_node as size_zero_node is of type sizetype which is not the > same as size_type_node. Otherwise looks good. In the software design classes I took this was called a design error: Not choosing sufficiently different names for different artifacts. It was considered a beginner's error. So now I have to repeat myself 16 times only to work around this b***. Nothing that will improve gfortran's maintainability. Second version of the changes needed for caf attached. Bootstrapped and regtested fine besides prior known FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for excess errors) on x86_64-linux/f23. - Andre -- Andre Vehreschild * Email: vehre ad gmx dot de pr78534_caf_v2.clog Description: Binary data diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 5b05a3d..1604bc8 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -4211,7 +4211,7 @@ size or one for a scalar. @item @emph{Syntax}: @code{void caf_register (size_t size, caf_register_t type, caf_token_t *token, -gfc_descriptor_t *desc, int *stat, char *errmsg, int errmsg_len)} +gfc_descriptor_t *desc, int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4263,7 +4263,7 @@ library is only expected to free memory it allocated itself during a call to @item @emph{Syntax}: @code{void caf_deregister (caf_token_t *token, caf_deregister_t type, -int *stat, char *errmsg, int errmsg_len)} +int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4322,7 +4322,8 @@ to a remote image identified by the image_index. @item @emph{Syntax}: @code{void _gfortran_caf_send (caf_token_t token, size_t offset, int image_index, gfc_descriptor_t *dest, caf_vector_t *dst_vector, -gfc_descriptor_t *src, int dst_kind, int src_kind, bool may_require_tmp)} +gfc_descriptor_t *src, int dst_kind, int src_kind, bool may_require_tmp, +int *stat)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4345,6 +4346,9 @@ time that the @var{dest} and @var{src} either cannot overlap or overlap (fully or partially) such that walking @var{src} and @var{dest} in element wise element order (honoring the stride value) will not lead to wrong results. Otherwise, the value is true. +@item @var{stat} @tab intent(out) when non-NULL give the result of the +operation, i.e., zero on success and non-zero on error. When NULL and error +occurs, then an error message is printed and the program is terminated. @end multitable @item @emph{NOTES} @@ -4375,7 +4379,8 @@ image identified by the image_index. @item @emph{Syntax}: @code{void _gfortran_caf_get (caf_token_t token, size_t offset, int image_index, gfc_descriptor_t *src, caf_vector_t *src_vector, -gfc_descriptor_t *dest, int src_kind, int dst_kind, bool may_require_tmp)} +gfc_descriptor_t *dest, int src_kind, int dst_kind, bool may_require_tmp, +int *stat)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4398,6 +4403,9 @@ time that the @var{dest} and @var{src} either cannot overlap or overlap (fully or partially) such that walking @var{src} and @var{dest} in element wise element order (honoring the stride value) will not lead to wrong results. Otherwise, the value is true. +@item @var{stat} @tab intent(out) when non-NULL give the result of the +operation, i.e., zero on success and non-zero on error. When NULL and error +occurs, then an error message is printed and the program is terminated. @end multitable @item @emph{NOTES} @@ -4430,7 +4438,7 @@ dst_image_index. int dst_image_index, gfc_descriptor_t *dest, caf_vector_t *dst_vector, caf_token_t src_token, size_t src_offset, int src_image_index, gfc_descriptor_t *src, caf_vector_t *src_vector, int dst_kind, int src_kind, -bool may_require_tmp)} +bool may_require_tmp, int *stat)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4461,6 +4469,9 @@ time that the @var{dest} and @var{src} either cannot overlap or overlap (fully or partially) such that walking @var{src} and @var{dest} in element wise element order (honoring the stride value) will not lead to wrong results. Otherwise, the value is true. +@item @var{stat} @tab intent(out) when non-NULL give the result of the +operation, i.e., zero on success and non-zero on error. When NULL and error +occurs, then an error message is printed and the program is terminated. @end multitable @item @emph{NOTES} @@ -4673,7 +4684,7 @@ been locked by the same image is an error. @item @emph{Syntax}: @code{void _gfortran_caf_lock (caf_token_t token, size_t index, int image_index, -int *aquired_lock, int *stat, char *errmsg, int errmsg_len)} +int *aquired_lock, int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4708,7 +4719,7 @@ which is unlocked or has bee
Re: [PATCH] PR 78534 Change character length from int to size_t
Dear Bob, First, regarding the ABI vs. API question: there is no consistent API for how to pass between Fortran and C strings, unless one uses Fortran 2003’s ISO_C_BINDING. It’s an ABI detail, in the sense that every compiler will choose to do things their own way: most compilers who pass a hidden length parameter, although its size (32-bit or 64-bit or size_t) and position (either after the char pointer, or at the end of the argument list) are variable between compilers. So, any code that does this is already compiler-specific. Second, there are good reasons we might want to change this. One is possible use cases (although there are few, by definition, because we simply don’t support those right now). The second one is compatibility with C string-handling functions, who operate on size_t arguments, which means we can now use those functions without casting types around all the time. Finally, if we’re making this change, we welcome any feedback on how to make it as easy as possible to handle in user code. Documentation, preprocessor macros, etc. In particular, one of the things we will need to address is on helping widely used code to adapt to the change, so that. One example I am thinking of, that uses old-style C/Fortran interfaces, is MPI libraries (openmpi & mpich). We definitely need to test those to make sure nothing breaks if we are going to proceed — or they need to be fixed upstream well before we release, and with due note of the incompatibility in our release notes. Cheers, FX
Re: [PATCH] PR 78534 Change character length from int to size_t
On 12/19/16 11:33 AM, Janne Blomqvist wrote: On Mon, Dec 19, 2016 at 6:43 PM, Bob Deen wrote: Hi all... I never saw any followup on this...? It's one thing to break the ABI between the compiler and the gfortran library; those can generally be expected to be in sync. It's another to break the ABI between two *languages*, when there might be no such expectation (especially if gcc does NOT break their ABI at the same version number transition). Yes, the pre-ISO_C_BINDING method may be old-fashioned, but it is a de-facto standard, and breaking it should not be done lightly. First: No, it's not done "lightly". And secondly, cross-language interfacing is always tricky, which is why most languages, including Fortran with ISO_C_BINDING, have devised standardized ways for communication with C so users don't need to rely on various cross-call mechanisms that are not guaranteed to work. Apologies if I offended (and to Steve too). I see all the deliberation you're doing for breaking the language->library ABI, and appreciate that. It's well-justified. My point, however, is that with this change you are breaking an entirely *different* ABI - that between Fortran and C - and the sum total of discussion was one message from Janne pointing out that it was breaking (thanks for that heads-up, I had missed it!), with no followup. Janne, you yourself in that message questioned the need for large strings, and had no use cases in response to FX's inquiry. Now that I think about it, it's not even an ABI change, it's an API change... requiring a code change, not just a recompile. So in this case, this change represents (AFAIK) the only breakage in the old-style Fortran<->C ABI/API, with no known use cases... and thus my question about whether it's justified. It's a fair question. I'm not arguing the language->library ABI at all. C changed to use size_t for string lengths instead of int with ANSI C in, what, 1989. With 2-socket servers, typically used e.g. in HPC clusters, today easily having hundreds of gigs of RAM, limiting GFortran char lengths to 2 GB for all eternity in the name of compatibility seems quaint at best. Maybe in your organization Fortran is legacy code that YE SHALL NOT TOUCH, but GFortran also has to cater to users who have chosen to write new code in Fortran. I understand that. It just seems that opening up an entirely *new* ABI/API for breakage deserved a little more discussion. Y'all are the ones doing the (mostly volunteer) work on gfortran, and I appreciate it. You're also much more invested in the future of the language than I (yeah, it's mostly legacy code for us). If you end up deciding that it needs to be done, then I'll deal with it. I just wanted to chime in that there are users who will be affected. If I'm the only one, I wouldn't want to stand in the way of progress - but also don't want to get steamrolled if it's not an important change, or if there are other affected users. So... ARE there any other affected users out there?? Oh, you have macros rather than hard-coded int all over the place? Shouldn't it be a relatively trivial affair then to define that macro appropriately depending on which compiler and which version you're using? I wouldn't call it trivial by any means... it's tricky code I haven't had to look at in 10 years. But in the end, probably doable. Steve showed how you can do it for Fortran. From the C side, just check the version from the __GNUC__ macro. I dislike having to check for version numbers (feels kludgy) but that's a personal preference. That will probably work, with a bit of futzing. Thanks for your attention... -Bob Bob Deen @ NASA-JPL Multimission Image Processing Lab bob.d...@jpl.nasa.gov
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 19, 2016 at 6:43 PM, Bob Deen wrote: > Hi all... > > I never saw any followup on this...? > > It's one thing to break the ABI between the compiler and the gfortran > library; those can generally be expected to be in sync. It's another to > break the ABI between two *languages*, when there might be no such > expectation (especially if gcc does NOT break their ABI at the same version > number transition). Yes, the pre-ISO_C_BINDING method may be old-fashioned, > but it is a de-facto standard, and breaking it should not be done lightly. First: No, it's not done "lightly". And secondly, cross-language interfacing is always tricky, which is why most languages, including Fortran with ISO_C_BINDING, have devised standardized ways for communication with C so users don't need to rely on various cross-call mechanisms that are not guaranteed to work. That the charlen is a hidden argument added at the end of type int is AFAIK a fairly common implementation choice, though I'm not sure if it can be called a de-facto standard. Considering that Intel Fortran has switched to size_t several years ago (5-ish?), and AFAIU it's the most used Fortran compiler around in addition to GFortran, and the world hasn't crashed down due to it, I suspect users can adapt to the change with relatively little drama. That gcc would change it's ABI at all, and especially in conjunction with gfortran, is a pipe dream. C changed to use size_t for string lengths instead of int with ANSI C in, what, 1989. With 2-socket servers, typically used e.g. in HPC clusters, today easily having hundreds of gigs of RAM, limiting GFortran char lengths to 2 GB for all eternity in the name of compatibility seems quaint at best. Maybe in your organization Fortran is legacy code that YE SHALL NOT TOUCH, but GFortran also has to cater to users who have chosen to write new code in Fortran. > If you do proceed with changing the size, I would request that there at > least be a facility to reliably tell at compile time (on the C side) which > definition is being used, so I can adjust our macros accordingly. Oh, you have macros rather than hard-coded int all over the place? Shouldn't it be a relatively trivial affair then to define that macro appropriately depending on which compiler and which version you're using? Steve showed how you can do it for Fortran. From the C side, just check the version from the __GNUC__ macro. > Our code > does depend on the size, and it has to cross-platform (and now, if this > change is made, cross-version), so with this change I would have to support > both int and size_t. Well, if you add the option to use size_t you should be able to use ifort as well. :) > A C-side preprocessor symbol definition would do the trick. Of course that > assumes the versions of gcc/g++ and gfortran are in sync, which is never > guaranteed. But that assumption is better than nothing. Unless someone has > a better idea...? Yeah, I think that's the best idea. Another option would be to implement some kind of -fcharacter-length=[int,size_t] command-line option. But that would make the patch a lot more complicated since one would need to typecast the character length argument when calling libgfortran. And, you'd still have to have some version-dependent checks to see if gfortran would accept that option. And like other similar options like -fdefault-this-or-that it would change the ABI, so code compiled with that option would be incompatible with code compiled without it. So in the end I'm not convinced such an option would actually make life any easier for our users. > Perhaps it might be best to wait until a time when gcc is also breaking > their ABI, so that there's no question of code (on either side) working > across the transition...? AFAIK there is no ABI change planned for gcc. For better or worse, the C language is relatively stable and doesn't change much. > P.S. I'm just a lurker here, but I lurk specifically to look for things > that will break our code base, like this ;-) Well, then you ought to be aware the ABI cleanup page on the wiki, where the char length issue has been listed for, what, 5 years or so, so it can't really be a surprise that it will happen at some point, can it...? > > Bob.Deen @ NASA-JPL Multimission Image Processing Lab > bob.d...@jpl.nasa.gov > > > > On 12/12/16 10:26 AM, Bob Deen wrote: >> >> >>> However, this will also affect people doing C->Fortran calls the >>> old-fashioned way without ISO_C_BINDING, as they will have to change >>> the string length argument from int to size_t in their prototypes. >>> Then again, Intel Fortran did this some years ago so I guess at least >>> people who care about portability to several compilers are aware. >> >> >> We do a ton of this (old fashioned c-fortran binding) and changing the >> string length argument size will have a big impact on us. We don't use the >> Intel compiler so we never noticed a change there. >> >> Is there really a use case
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 19, 2016 at 08:43:01AM -0800, Bob Deen wrote: > > It's one thing to break the ABI between the compiler and the gfortran > library; those can generally be expected to be in sync. It's another to > break the ABI between two *languages*, when there might be no such > expectation (especially if gcc does NOT break their ABI at the same > version number transition). Yes, the pre-ISO_C_BINDING method may be > old-fashioned, but it is a de-facto standard, and breaking it should not > be done lightly. Do you really think that those of us who actively contribute to gfortran development take breaking the ABI lightly? We have put off changes to gfortran's library for several years to specifically avoid ABI breakage. It seems that there is never a "Good Time" to break the ABI. However, in this case, support for F2008 9.6.4.8, Defined Input/Output, necessitates a change in the ABI. Instead of breaking the ABI multiple times, it has been decided to try to cleanup some long standing issues with libgfortran. > If you do proceed with changing the size, I would request that there at > least be a facility to reliably tell at compile time (on the C side) > which definition is being used, so I can adjust our macros accordingly. > Our code does depend on the size, and it has to cross-platform (and now, > if this change is made, cross-version), so with this change I would have > to support both int and size_t. As the breakage is going to occur with gfortran 7.0, you do % cat a.F90 #if defined(__GFORTRAN__) && (__GNUC__ > 6) print *, '7' #else print *, 'not 7' #endif end % gfc7 -E a.F90 | cat -s ] gfc7 -E a.F90 | cat -s # 1 "a.F90" # 1 "" # 1 "" # 1 "a.F90" print *, '7' end % gfortran6 -E a.F90 | cat -s # 1 "a.F90" # 1 "" # 1 "" # 1 "a.F90" print *, 'not 7' end > Perhaps it might be best to wait until a time when gcc is also breaking > their ABI, so that there's no question of code (on either side) working > across the transition...? There is never a good time. If we are to wait for gcc, should we remove support for Defined Input/Output from the compiler? -- Steve
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi all... I never saw any followup on this...? It's one thing to break the ABI between the compiler and the gfortran library; those can generally be expected to be in sync. It's another to break the ABI between two *languages*, when there might be no such expectation (especially if gcc does NOT break their ABI at the same version number transition). Yes, the pre-ISO_C_BINDING method may be old-fashioned, but it is a de-facto standard, and breaking it should not be done lightly. If you do proceed with changing the size, I would request that there at least be a facility to reliably tell at compile time (on the C side) which definition is being used, so I can adjust our macros accordingly. Our code does depend on the size, and it has to cross-platform (and now, if this change is made, cross-version), so with this change I would have to support both int and size_t. A C-side preprocessor symbol definition would do the trick. Of course that assumes the versions of gcc/g++ and gfortran are in sync, which is never guaranteed. But that assumption is better than nothing. Unless someone has a better idea...? Perhaps it might be best to wait until a time when gcc is also breaking their ABI, so that there's no question of code (on either side) working across the transition...? Thanks... -Bob P.S. I'm just a lurker here, but I lurk specifically to look for things that will break our code base, like this ;-) Bob.Deen @ NASA-JPL Multimission Image Processing Lab bob.d...@jpl.nasa.gov On 12/12/16 10:26 AM, Bob Deen wrote: However, this will also affect people doing C->Fortran calls the old-fashioned way without ISO_C_BINDING, as they will have to change the string length argument from int to size_t in their prototypes. Then again, Intel Fortran did this some years ago so I guess at least people who care about portability to several compilers are aware. We do a ton of this (old fashioned c-fortran binding) and changing the string length argument size will have a big impact on us. We don't use the Intel compiler so we never noticed a change there. Is there really a use case for strings > 2 GB that justifies the breakage? I certainly understand wanting to do it "right" but I'm probably not the only one with practical considerations that argue against it if there are no compelling use cases. Thanks... -Bob Bob Deen @ NASA-JPL Multimission Image Processing Lab bob.d...@jpl.nasa.gov
Re: [PATCH] PR 78534 Change character length from int to size_t
> Le 16 déc. 2016 à 19:06, Janne Blomqvist a écrit : > > On Fri, Dec 16, 2016 at 5:50 PM, Dominique d'Humières > wrote: >> Hi Janne, >> >> I have applied your two patches and found that I had to skip the patches for >> resolve.c and match.c due to the errors >> >> ../../p_work/gcc/fortran/resolve.c: In function 'void >> resolve_select_type(gfc_code*, gfc_namespace*)': >> ../../p_work/gcc/fortran/resolve.c:8731:58: error: format '%lld' expects >> argument of type 'long long int', but argument 4 has type 'long int' >> [-Werror=format=] >> gfc_basic_typename (c->ts.type), charlen, c->ts.kind); >> >> and >> >> ../../p_work/gcc/fortran/match.c: In function 'gfc_symtree* >> select_intrinsic_set_tmp(gfc_typespec*)': >> ../../p_work/gcc/fortran/match.c:5786:55: error: format '%lld' expects >> argument of type 'long long int', but argument 4 has type 'long int' >> [-Werror=format=] >> gfc_basic_typename (ts->type), charlen, ts->kind); >> > > Oh, blast. Of course you're right, from the second patch, the part > that touched resolve.c and match.c are incorrect. > >> while the patch for dump-parse-tree.c was needed. >> >> With the patches applied I see several failures in the test suite compiled >> with ‘-m64 -flto’, but not with ‘-m32 -flto’ of the kind > > Ok, thanks for the hint. I haven't tested with -flto. Do you just run > it with something like > > CFLAGS="-flto" make -j $NCPUS check-fortran > > or how do you do it? Due to recurrent problems with -flto, I have the following change in my working tree --- ../_clean/gcc/testsuite/lib/gfortran-dg.exp 2016-02-12 20:21:04.0 +0100 +++ gcc/testsuite/lib/gfortran-dg.exp 2016-02-13 10:03:11.0 +0100 @@ -140,8 +140,16 @@ proc gfortran-dg-runtest { testcases fla # we cycle through the option list, otherwise we don't if [expr [search_for $test "dg-do run"]] { set option_list $torture_with_loops + if [check_effective_target_lto] { + lappend option_list { -g -flto } + # lappend option_list { -g -O3 -fwhole-program -flto } + } } else { set option_list [list { -O } ] + # if [check_effective_target_lto] { + # lappend option_list { -g -flto } + # lappend option_list { -g -O3 -fwhole-program -flto } + # } } set nshort [file tail [file dirname $test]]/[file tail $test] i.e., I add the options ‘-g -flto’ to the list of options used by the gfortran test suite. > > I don't have lldb, but I guess I can see roughly the same with gdb.. For what I do, it is close to gdb (I did not succeed to authorize it on recent darwin). Dominique >> The affected tests are >> >> array_constructor_17.f90 >> auto_char_len_3.f90 >> char_length_14.f90 >> char_length_5.f90 >> char_result_*.f90 >> charlen_03.f90 >> deferred_type_param_4.f90 >> dummy_procedure_3.f90 >> mapping_[12].f90 >> module_read_[12].f90 >> parens_5.f90 >> pr57910.f90 >> proc_ptr_comp_16.f90 >> result_in_spec_2.f90 >> spec_expr_7.f90 >> string_length_1.f90 >> transfer_intrinsic_3.f90 >> widechar_6.f90 >> zero_length_1.f90 >> >> Note that I did not run lldb on all of them, thus I cannot guarantee that >> all fail along the same pattern. >> >> Cheers, >> >> Dominique >> > > > > -- > Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On Fri, Dec 16, 2016 at 5:50 PM, Dominique d'Humières wrote: > Hi Janne, > > I have applied your two patches and found that I had to skip the patches for > resolve.c and match.c due to the errors > > ../../p_work/gcc/fortran/resolve.c: In function 'void > resolve_select_type(gfc_code*, gfc_namespace*)': > ../../p_work/gcc/fortran/resolve.c:8731:58: error: format '%lld' expects > argument of type 'long long int', but argument 4 has type 'long int' > [-Werror=format=] > gfc_basic_typename (c->ts.type), charlen, c->ts.kind); > > and > > ../../p_work/gcc/fortran/match.c: In function 'gfc_symtree* > select_intrinsic_set_tmp(gfc_typespec*)': > ../../p_work/gcc/fortran/match.c:5786:55: error: format '%lld' expects > argument of type 'long long int', but argument 4 has type 'long int' > [-Werror=format=] >gfc_basic_typename (ts->type), charlen, ts->kind); > Oh, blast. Of course you're right, from the second patch, the part that touched resolve.c and match.c are incorrect. > while the patch for dump-parse-tree.c was needed. > > With the patches applied I see several failures in the test suite compiled > with ‘-m64 -flto’, but not with ‘-m32 -flto’ of the kind Ok, thanks for the hint. I haven't tested with -flto. Do you just run it with something like CFLAGS="-flto" make -j $NCPUS check-fortran or how do you do it? I don't have lldb, but I guess I can see roughly the same with gdb.. > > [Book15] f90/bug% gfc > /opt/gcc/work/gcc/testsuite/gfortran.dg/char_result_1.f90 -flto -c > [Book15] f90/bug% lldb > /opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1 > (lldb) target create > "/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1" > Current executable set to > '/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1' (x86_64). > (lldb) run char_result_1.o > Process 1310 launched: > '/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1' (x86_64) > Reading object files: char_result_1.o {GC start 2195k} > Reading the callgraph > Merging declarations > Reading summaries > Reading function bodies: > Performing interprocedural optimizations > > Assembling functions: > f2 f1 double test > MAIN__Process 1310 stopped > * thread #1: tid = 0x54ac, 0x0001008f76cf > lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, > fn=0x, tag=) + 127 at lto-streamer-in.c:332, > queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, > address=0x18) > frame #0: 0x0001008f76cf lto1`lto_input_tree_ref(ib=, > data_in=0x000142621090, fn=0x, tag=) + 127 > at lto-streamer-in.c:332 >329 >330 case LTO_ssa_name_ref: >331ix_u = streamer_read_uhwi (ib); > -> 332result = (*SSANAMES (fn))[ix_u]; >333break; >334 >335 case LTO_field_decl_ref: > (lldb) bt > * thread #1: tid = 0x54ac, 0x0001008f76cf > lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, > fn=0x, tag=) + 127 at lto-streamer-in.c:332, > queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, > address=0x18) > * frame #0: 0x0001008f76cf lto1`lto_input_tree_ref(ib=, > data_in=0x000142621090, fn=0x, tag=) + 127 > at lto-streamer-in.c:332 > frame #1: 0x0001008f78f1 lto1`lto_input_tree_1(ib=0x7fff5fbfeeb0, > data_in=0x000142621090, tag=, hash=) + 177 at > lto-streamer-in.c:1446 > frame #2: 0x0001008f7cb8 lto1`lto_input_tree(ib=0x7fff5fbfeeb0, > data_in=0x000142621090) + 88 at lto-streamer-in.c:1492 > frame #3: 0x000100ce8b43 > lto1`streamer_read_tree_body(lto_input_block*, data_in*, tree_node*) + 51 at > tree-streamer-in.c:893 > frame #4: 0x000100ce8b10 > lto1`streamer_read_tree_body(ib=0x7fff5fbfeeb0, > data_in=0x000142621090, expr=0x000143944af0) + 2096 > frame #5: 0x0001008f7110 > lto1`::lto_read_tree_1(ib=0x7fff5fbfeeb0, data_in=0x000142621090, > expr=0x000143944af0) + 32 at lto-streamer-in.c:1333 > frame #6: 0x0001008f78b9 lto1`lto_input_tree_1(lto_input_block*, > data_in*, LTO_tags, unsigned int) + 38 at lto-streamer-in.c:1363 > frame #7: 0x0001008f7893 lto1`lto_input_tree_1(lto_input_block*, > data_in*, LTO_tags, unsigned int) + 20 > frame #8: 0x0001008f787f lto1`lto_input_tree_1(ib=0x7fff5fbfeeb0, > data_in=0x000142621090, tag=, hash=2871463685) + 63 > frame #9: 0x0001008f7c03 lto1`lto_input_scc(ib=0x7fff5fbfeeb0, > data_in=0x000142621090, len=0x7fff5fbfede8, > entry_len=0x7fff5fbfedec) + 371 at lto-streamer-in.c:1387 > frame #10: 0x0001008f7c91 lto1`lto_input_tree(ib=0x7fff5fbfeeb0, > data_in=0x000142621090) + 49 at lto-streamer-in.c:1490 > frame #11: 0x0001008f8d93 > lto1`::lto_read_body_or_constructor(file_data=0x000143721000, > data=, node=, > section_type=LTO_section_function_body) + 931 at lto-streamer-in.c:1045 >
Re: [PATCH] PR 78534 Change character length from int to size_t
On Fri, Dec 16, 2016 at 5:34 PM, Andre Vehreschild wrote: > Hi Janne, hi all, > > as promised please find attached the change from int32 to size_t for the > caf-libraries. Because the caf-libraries do not require special notions > indicated by negative values I went for using size_t there. I assume this will > be easier to keep in sync for all caf-libraries, because the size_t is > available on all modern platforms. I also took the liberty to fix the > specifiers in trans-decl for the caf-function-declarations and update the > documentation on the caf-functions in gfortran.texi where some parameters > where > missing. > > These additional changes bootstrap fine and induce no new regressions on > x86_64-linux/f23. I think you should use build_zero_cst(size_type_node) instead of size_zero_node as size_zero_node is of type sizetype which is not the same as size_type_node. Otherwise looks good. And yes, I think it makes sense to use size_t directly instead of introducing the GFortran specific gfc_charlen_type typedef. > I am still not sure, whether we shouldn't address the regression in > allocate_deferred_char_scalar_1. I did some research, but could not yet come > to > a practical solution. The backend somehow deduces that the memory pointed to > has size 5 only. But I haven't found a way to do this in the front end. > > Comments on these changes? > > - Andre > > On Tue, 13 Dec 2016 21:08:51 +0200 > Janne Blomqvist wrote: > >> On Mon, Dec 12, 2016 at 9:39 PM, Janne Blomqvist >> wrote: >> > On Mon, Dec 12, 2016 at 7:39 PM, Andre Vehreschild wrote: >> >> Hi Janne, >> >> >> >> I found that you are favoring build_int_cst (size_type_node, 0) over >> >> size_zero_node. Is there a reason to this? >> > >> > Yes. AFAIU size_zero_node is a zero constant for sizetype which is not >> > the same as size_type_node. >> > >> > AFAIK the difference is that size_type_node is the C size_t type, >> > whereas sizetype is a GCC internal type used for address expressions. >> > On a "normal" target I understand that they are the same size, but >> > there are some slight differences in semantics, e.g. size_type_node >> > like C unsigned integer types is defined to wrap around on overflow >> > whereas sizetype is undefined on overflow. >> > >> > I don't know if GCC supports some strange targets with some kind of >> > segmented memory where the size of sizetype would be different from >> > size_type_node, but I guess it's possible in theory at least. >> > >> > That being said, now that you mention in I should be using >> > build_zero_cst (some_type_node) instead of >> > build_int_cst(some_type_node, 0). There's also build_one_cst that I >> > should use. >> > >> >> Furthermore did I have to patch this: >> >> >> >> diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c >> >> index 585f25d..f374558 100644 >> >> --- a/gcc/fortran/dump-parse-tree.c >> >> +++ b/gcc/fortran/dump-parse-tree.c >> >> @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) >> >> break; >> >> >> >> case BT_HOLLERITH: >> >> - fprintf (dumpfile, "%dH", p->representation.length); >> >> + fprintf (dumpfile, "%zdH", p->representation.length); >> >> c = p->representation.string; >> >> for (i = 0; i < p->representation.length; i++, c++) >> >> { >> >> >> >> to bootstrap on x86_64-linux/f23. >> > >> > Ah, thanks for the catch. I'll fix it by using HOST_WIDE_INT_PRINT_DEC >> > since I'll have to change gfc_charlen_t to be a typedef form >> > HOST_WIDE_INT (see my answer to FX). >> > >> >> And I have this regression: >> >> >> >> FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for >> >> excess errors) >> >> >> >> allocate_deferred_char_scalar_1.f03:184:0: >> >> >> >> p = '12345679' >> >> >> >> Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 >> >> overflows the destination [-Wstringop-overflow=] >> >> allocate_deferred_char_scalar_1.f03:242:0: >> >> >> >> p = 4_'12345679' >> >> >> >> Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 >> >> overflows the destination [-Wstringop-overflow=] >> > >> > I'm seeing that too, but I assumed they would be fixed by Paul's >> > recent patch which I don't yet have in my tree yet due to the git >> > mirror being stuck.. >> > >> >> Btw, the patch for changing the ABI of the coarray-libs is already nearly >> >> done. I just need to figure that what the state of regressions is with and >> >> without my change. >> > >> > Thanks. >> > >> > I'll produce an updated patch with the changes discussed so far. >> > >> > >> > -- >> > Janne Blomqvist >> >> Hi, >> >> attached is the updated patch that applies on top of the original. I >> didn't do the charlen_zero_node etc, I just fixed the relatively few >> places in my previous patch rather than everywhere in the entire >> frontend. >> >> Now that the git mirror is working again, I see though those warnings >> with -O1 from gfortran.dg/allocate_defe
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, I have applied your two patches and found that I had to skip the patches for resolve.c and match.c due to the errors ../../p_work/gcc/fortran/resolve.c: In function 'void resolve_select_type(gfc_code*, gfc_namespace*)': ../../p_work/gcc/fortran/resolve.c:8731:58: error: format '%lld' expects argument of type 'long long int', but argument 4 has type 'long int' [-Werror=format=] gfc_basic_typename (c->ts.type), charlen, c->ts.kind); and ../../p_work/gcc/fortran/match.c: In function 'gfc_symtree* select_intrinsic_set_tmp(gfc_typespec*)': ../../p_work/gcc/fortran/match.c:5786:55: error: format '%lld' expects argument of type 'long long int', but argument 4 has type 'long int' [-Werror=format=] gfc_basic_typename (ts->type), charlen, ts->kind); while the patch for dump-parse-tree.c was needed. With the patches applied I see several failures in the test suite compiled with ‘-m64 -flto’, but not with ‘-m32 -flto’ of the kind [Book15] f90/bug% gfc /opt/gcc/work/gcc/testsuite/gfortran.dg/char_result_1.f90 -flto -c [Book15] f90/bug% lldb /opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1 (lldb) target create "/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1" Current executable set to '/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1' (x86_64). (lldb) run char_result_1.o Process 1310 launched: '/opt/gcc/gcc7w/libexec/gcc/x86_64-apple-darwin16.1.0/7.0.0/lto1' (x86_64) Reading object files: char_result_1.o {GC start 2195k} Reading the callgraph Merging declarations Reading summaries Reading function bodies: Performing interprocedural optimizations Assembling functions: f2 f1 double test MAIN__Process 1310 stopped * thread #1: tid = 0x54ac, 0x0001008f76cf lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, fn=0x, tag=) + 127 at lto-streamer-in.c:332, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x18) frame #0: 0x0001008f76cf lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, fn=0x, tag=) + 127 at lto-streamer-in.c:332 329 330 case LTO_ssa_name_ref: 331ix_u = streamer_read_uhwi (ib); -> 332result = (*SSANAMES (fn))[ix_u]; 333break; 334 335 case LTO_field_decl_ref: (lldb) bt * thread #1: tid = 0x54ac, 0x0001008f76cf lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, fn=0x, tag=) + 127 at lto-streamer-in.c:332, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x18) * frame #0: 0x0001008f76cf lto1`lto_input_tree_ref(ib=, data_in=0x000142621090, fn=0x, tag=) + 127 at lto-streamer-in.c:332 frame #1: 0x0001008f78f1 lto1`lto_input_tree_1(ib=0x7fff5fbfeeb0, data_in=0x000142621090, tag=, hash=) + 177 at lto-streamer-in.c:1446 frame #2: 0x0001008f7cb8 lto1`lto_input_tree(ib=0x7fff5fbfeeb0, data_in=0x000142621090) + 88 at lto-streamer-in.c:1492 frame #3: 0x000100ce8b43 lto1`streamer_read_tree_body(lto_input_block*, data_in*, tree_node*) + 51 at tree-streamer-in.c:893 frame #4: 0x000100ce8b10 lto1`streamer_read_tree_body(ib=0x7fff5fbfeeb0, data_in=0x000142621090, expr=0x000143944af0) + 2096 frame #5: 0x0001008f7110 lto1`::lto_read_tree_1(ib=0x7fff5fbfeeb0, data_in=0x000142621090, expr=0x000143944af0) + 32 at lto-streamer-in.c:1333 frame #6: 0x0001008f78b9 lto1`lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) + 38 at lto-streamer-in.c:1363 frame #7: 0x0001008f7893 lto1`lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) + 20 frame #8: 0x0001008f787f lto1`lto_input_tree_1(ib=0x7fff5fbfeeb0, data_in=0x000142621090, tag=, hash=2871463685) + 63 frame #9: 0x0001008f7c03 lto1`lto_input_scc(ib=0x7fff5fbfeeb0, data_in=0x000142621090, len=0x7fff5fbfede8, entry_len=0x7fff5fbfedec) + 371 at lto-streamer-in.c:1387 frame #10: 0x0001008f7c91 lto1`lto_input_tree(ib=0x7fff5fbfeeb0, data_in=0x000142621090) + 49 at lto-streamer-in.c:1490 frame #11: 0x0001008f8d93 lto1`::lto_read_body_or_constructor(file_data=0x000143721000, data=, node=, section_type=LTO_section_function_body) + 931 at lto-streamer-in.c:1045 frame #12: 0x00010051a136 lto1`cgraph_node::get_untransformed_body(this=0x00014391ee60) + 278 at cgraph.c:3581 frame #13: 0x000100526d9a lto1`cgraph_node::expand(this=0x00014391ee60) + 74 at cgraphunit.c:1971 frame #14: 0x000100527f00 lto1`::output_in_order(no_reorder=) + 528 at cgraphunit.c:2244 frame #15: 0x000100528528 lto1`symbol_table::compile(this=0x000143526100) + 952 at cgraphunit.c:2488 frame #16: 0x000100034c68 lto1`lto_main() + 7112 at lto.c:3330 frame #17: 0x000100a87f1a lto1`::compile_file() + 58 at toplev.c:463 frame #18
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, hi all, as promised please find attached the change from int32 to size_t for the caf-libraries. Because the caf-libraries do not require special notions indicated by negative values I went for using size_t there. I assume this will be easier to keep in sync for all caf-libraries, because the size_t is available on all modern platforms. I also took the liberty to fix the specifiers in trans-decl for the caf-function-declarations and update the documentation on the caf-functions in gfortran.texi where some parameters where missing. These additional changes bootstrap fine and induce no new regressions on x86_64-linux/f23. I am still not sure, whether we shouldn't address the regression in allocate_deferred_char_scalar_1. I did some research, but could not yet come to a practical solution. The backend somehow deduces that the memory pointed to has size 5 only. But I haven't found a way to do this in the front end. Comments on these changes? - Andre On Tue, 13 Dec 2016 21:08:51 +0200 Janne Blomqvist wrote: > On Mon, Dec 12, 2016 at 9:39 PM, Janne Blomqvist > wrote: > > On Mon, Dec 12, 2016 at 7:39 PM, Andre Vehreschild wrote: > >> Hi Janne, > >> > >> I found that you are favoring build_int_cst (size_type_node, 0) over > >> size_zero_node. Is there a reason to this? > > > > Yes. AFAIU size_zero_node is a zero constant for sizetype which is not > > the same as size_type_node. > > > > AFAIK the difference is that size_type_node is the C size_t type, > > whereas sizetype is a GCC internal type used for address expressions. > > On a "normal" target I understand that they are the same size, but > > there are some slight differences in semantics, e.g. size_type_node > > like C unsigned integer types is defined to wrap around on overflow > > whereas sizetype is undefined on overflow. > > > > I don't know if GCC supports some strange targets with some kind of > > segmented memory where the size of sizetype would be different from > > size_type_node, but I guess it's possible in theory at least. > > > > That being said, now that you mention in I should be using > > build_zero_cst (some_type_node) instead of > > build_int_cst(some_type_node, 0). There's also build_one_cst that I > > should use. > > > >> Furthermore did I have to patch this: > >> > >> diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c > >> index 585f25d..f374558 100644 > >> --- a/gcc/fortran/dump-parse-tree.c > >> +++ b/gcc/fortran/dump-parse-tree.c > >> @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) > >> break; > >> > >> case BT_HOLLERITH: > >> - fprintf (dumpfile, "%dH", p->representation.length); > >> + fprintf (dumpfile, "%zdH", p->representation.length); > >> c = p->representation.string; > >> for (i = 0; i < p->representation.length; i++, c++) > >> { > >> > >> to bootstrap on x86_64-linux/f23. > > > > Ah, thanks for the catch. I'll fix it by using HOST_WIDE_INT_PRINT_DEC > > since I'll have to change gfc_charlen_t to be a typedef form > > HOST_WIDE_INT (see my answer to FX). > > > >> And I have this regression: > >> > >> FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for > >> excess errors) > >> > >> allocate_deferred_char_scalar_1.f03:184:0: > >> > >> p = '12345679' > >> > >> Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 > >> overflows the destination [-Wstringop-overflow=] > >> allocate_deferred_char_scalar_1.f03:242:0: > >> > >> p = 4_'12345679' > >> > >> Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 > >> overflows the destination [-Wstringop-overflow=] > > > > I'm seeing that too, but I assumed they would be fixed by Paul's > > recent patch which I don't yet have in my tree yet due to the git > > mirror being stuck.. > > > >> Btw, the patch for changing the ABI of the coarray-libs is already nearly > >> done. I just need to figure that what the state of regressions is with and > >> without my change. > > > > Thanks. > > > > I'll produce an updated patch with the changes discussed so far. > > > > > > -- > > Janne Blomqvist > > Hi, > > attached is the updated patch that applies on top of the original. I > didn't do the charlen_zero_node etc, I just fixed the relatively few > places in my previous patch rather than everywhere in the entire > frontend. > > Now that the git mirror is working again, I see though those warnings > with -O1 from gfortran.dg/allocate_deferred_char_scalar_1.f03 are > still there, and Paul's patch didn't get rid of them. :(. I have > looked at the tree dumps, however, and AFAICS it's a bogus warning. > > Ok for trunk? > -- Andre Vehreschild * Email: vehre ad gmx dot de gcc/fortran/ChangeLog: 2016-12-16 Andre Vehreschild * gfortran.texi: Changed string length to size_t and added missing documenation of parameters. * trans-array.c (gfc_alloc_allocatable_for_assignment): Adapted to
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 12, 2016 at 9:39 PM, Janne Blomqvist wrote: > On Mon, Dec 12, 2016 at 7:39 PM, Andre Vehreschild wrote: >> Hi Janne, >> >> I found that you are favoring build_int_cst (size_type_node, 0) over >> size_zero_node. Is there a reason to this? > > Yes. AFAIU size_zero_node is a zero constant for sizetype which is not > the same as size_type_node. > > AFAIK the difference is that size_type_node is the C size_t type, > whereas sizetype is a GCC internal type used for address expressions. > On a "normal" target I understand that they are the same size, but > there are some slight differences in semantics, e.g. size_type_node > like C unsigned integer types is defined to wrap around on overflow > whereas sizetype is undefined on overflow. > > I don't know if GCC supports some strange targets with some kind of > segmented memory where the size of sizetype would be different from > size_type_node, but I guess it's possible in theory at least. > > That being said, now that you mention in I should be using > build_zero_cst (some_type_node) instead of > build_int_cst(some_type_node, 0). There's also build_one_cst that I > should use. > >> Furthermore did I have to patch this: >> >> diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c >> index 585f25d..f374558 100644 >> --- a/gcc/fortran/dump-parse-tree.c >> +++ b/gcc/fortran/dump-parse-tree.c >> @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) >> break; >> >> case BT_HOLLERITH: >> - fprintf (dumpfile, "%dH", p->representation.length); >> + fprintf (dumpfile, "%zdH", p->representation.length); >> c = p->representation.string; >> for (i = 0; i < p->representation.length; i++, c++) >> { >> >> to bootstrap on x86_64-linux/f23. > > Ah, thanks for the catch. I'll fix it by using HOST_WIDE_INT_PRINT_DEC > since I'll have to change gfc_charlen_t to be a typedef form > HOST_WIDE_INT (see my answer to FX). > >> And I have this regression: >> >> FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for excess >> errors) >> >> allocate_deferred_char_scalar_1.f03:184:0: >> >> p = '12345679' >> >> Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 overflows >> the destination [-Wstringop-overflow=] >> allocate_deferred_char_scalar_1.f03:242:0: >> >> p = 4_'12345679' >> >> Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 >> overflows >> the destination [-Wstringop-overflow=] > > I'm seeing that too, but I assumed they would be fixed by Paul's > recent patch which I don't yet have in my tree yet due to the git > mirror being stuck.. > >> Btw, the patch for changing the ABI of the coarray-libs is already nearly >> done. >> I just need to figure that what the state of regressions is with and without >> my >> change. > > Thanks. > > I'll produce an updated patch with the changes discussed so far. > > > -- > Janne Blomqvist Hi, attached is the updated patch that applies on top of the original. I didn't do the charlen_zero_node etc, I just fixed the relatively few places in my previous patch rather than everywhere in the entire frontend. Now that the git mirror is working again, I see though those warnings with -O1 from gfortran.dg/allocate_deferred_char_scalar_1.f03 are still there, and Paul's patch didn't get rid of them. :(. I have looked at the tree dumps, however, and AFAICS it's a bogus warning. Ok for trunk? -- Janne Blomqvist diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c index 585f25d..7aafe54 100644 --- a/gcc/fortran/dump-parse-tree.c +++ b/gcc/fortran/dump-parse-tree.c @@ -348,12 +348,10 @@ show_constructor (gfc_constructor_base base) static void -show_char_const (const gfc_char_t *c, int length) +show_char_const (const gfc_char_t *c, gfc_charlen_t length) { - int i; - fputc ('\'', dumpfile); - for (i = 0; i < length; i++) + for (gfc_charlen_t i = 0; i < length; i++) { if (c[i] == '\'') fputs ("''", dumpfile); @@ -465,7 +463,8 @@ show_expr (gfc_expr *p) break; case BT_HOLLERITH: - fprintf (dumpfile, "%dH", p->representation.length); + fprintf (dumpfile, HOST_WIDE_INT_PRINT_DEC "H", + p->representation.length); c = p->representation.string; for (i = 0; i < p->representation.length; i++, c++) { diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 6a05e9e..e82c320 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -2067,10 +2067,12 @@ gfc_intrinsic_sym; typedef splay_tree gfc_constructor_base; -/* This should be a size_t. But occasionally the string length field - is used as a flag with values -1 and -2, see - e.g. gfc_add_assign_aux_vars. */ -typedef ptrdiff_t gfc_charlen_t; +/* This should be an unsigned variable of type size_t. But to handle + compiling to a 64-bit target from a 32-bit host, we need to use a + HOST_WIDE
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, How about adding charlen_zero_node and one_node like the others have it to prevent repeating ourselves? - Andre Am 12. Dezember 2016 20:39:38 MEZ, schrieb Janne Blomqvist : >On Mon, Dec 12, 2016 at 7:39 PM, Andre Vehreschild >wrote: >> Hi Janne, >> >> I found that you are favoring build_int_cst (size_type_node, 0) over >> size_zero_node. Is there a reason to this? > >Yes. AFAIU size_zero_node is a zero constant for sizetype which is not >the same as size_type_node. > >AFAIK the difference is that size_type_node is the C size_t type, >whereas sizetype is a GCC internal type used for address expressions. >On a "normal" target I understand that they are the same size, but >there are some slight differences in semantics, e.g. size_type_node >like C unsigned integer types is defined to wrap around on overflow >whereas sizetype is undefined on overflow. > >I don't know if GCC supports some strange targets with some kind of >segmented memory where the size of sizetype would be different from >size_type_node, but I guess it's possible in theory at least. > >That being said, now that you mention in I should be using >build_zero_cst (some_type_node) instead of >build_int_cst(some_type_node, 0). There's also build_one_cst that I >should use. > >> Furthermore did I have to patch this: >> >> diff --git a/gcc/fortran/dump-parse-tree.c >b/gcc/fortran/dump-parse-tree.c >> index 585f25d..f374558 100644 >> --- a/gcc/fortran/dump-parse-tree.c >> +++ b/gcc/fortran/dump-parse-tree.c >> @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) >> break; >> >> case BT_HOLLERITH: >> - fprintf (dumpfile, "%dH", p->representation.length); >> + fprintf (dumpfile, "%zdH", p->representation.length); >> c = p->representation.string; >> for (i = 0; i < p->representation.length; i++, c++) >> { >> >> to bootstrap on x86_64-linux/f23. > >Ah, thanks for the catch. I'll fix it by using HOST_WIDE_INT_PRINT_DEC >since I'll have to change gfc_charlen_t to be a typedef form >HOST_WIDE_INT (see my answer to FX). > >> And I have this regression: >> >> FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test >for excess >> errors) >> >> allocate_deferred_char_scalar_1.f03:184:0: >> >> p = '12345679' >> >> Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 >overflows >> the destination [-Wstringop-overflow=] >> allocate_deferred_char_scalar_1.f03:242:0: >> >> p = 4_'12345679' >> >> Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 >overflows >> the destination [-Wstringop-overflow=] > >I'm seeing that too, but I assumed they would be fixed by Paul's >recent patch which I don't yet have in my tree yet due to the git >mirror being stuck.. > >> Btw, the patch for changing the ABI of the coarray-libs is already >nearly done. >> I just need to figure that what the state of regressions is with and >without my >> change. > >Thanks. > >I'll produce an updated patch with the changes discussed so far. -- Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen Tel.: +49 241 929 10 18 * ve...@gmx.de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 12, 2016 at 7:39 PM, Andre Vehreschild wrote: > Hi Janne, > > I found that you are favoring build_int_cst (size_type_node, 0) over > size_zero_node. Is there a reason to this? Yes. AFAIU size_zero_node is a zero constant for sizetype which is not the same as size_type_node. AFAIK the difference is that size_type_node is the C size_t type, whereas sizetype is a GCC internal type used for address expressions. On a "normal" target I understand that they are the same size, but there are some slight differences in semantics, e.g. size_type_node like C unsigned integer types is defined to wrap around on overflow whereas sizetype is undefined on overflow. I don't know if GCC supports some strange targets with some kind of segmented memory where the size of sizetype would be different from size_type_node, but I guess it's possible in theory at least. That being said, now that you mention in I should be using build_zero_cst (some_type_node) instead of build_int_cst(some_type_node, 0). There's also build_one_cst that I should use. > Furthermore did I have to patch this: > > diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c > index 585f25d..f374558 100644 > --- a/gcc/fortran/dump-parse-tree.c > +++ b/gcc/fortran/dump-parse-tree.c > @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) > break; > > case BT_HOLLERITH: > - fprintf (dumpfile, "%dH", p->representation.length); > + fprintf (dumpfile, "%zdH", p->representation.length); > c = p->representation.string; > for (i = 0; i < p->representation.length; i++, c++) > { > > to bootstrap on x86_64-linux/f23. Ah, thanks for the catch. I'll fix it by using HOST_WIDE_INT_PRINT_DEC since I'll have to change gfc_charlen_t to be a typedef form HOST_WIDE_INT (see my answer to FX). > And I have this regression: > > FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for excess > errors) > > allocate_deferred_char_scalar_1.f03:184:0: > > p = '12345679' > > Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 overflows > the destination [-Wstringop-overflow=] > allocate_deferred_char_scalar_1.f03:242:0: > > p = 4_'12345679' > > Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 > overflows > the destination [-Wstringop-overflow=] I'm seeing that too, but I assumed they would be fixed by Paul's recent patch which I don't yet have in my tree yet due to the git mirror being stuck.. > Btw, the patch for changing the ABI of the coarray-libs is already nearly > done. > I just need to figure that what the state of regressions is with and without > my > change. Thanks. I'll produce an updated patch with the changes discussed so far. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
> However, this will also affect people doing C->Fortran calls the > old-fashioned way without ISO_C_BINDING, as they will have to change > the string length argument from int to size_t in their prototypes. > Then again, Intel Fortran did this some years ago so I guess at least > people who care about portability to several compilers are aware. We do a ton of this (old fashioned c-fortran binding) and changing the string length argument size will have a big impact on us. We don't use the Intel compiler so we never noticed a change there. Is there really a use case for strings > 2 GB that justifies the breakage? I certainly understand wanting to do it "right" but I'm probably not the only one with practical considerations that argue against it if there are no compelling use cases. Thanks... -Bob Bob Deen @ NASA-JPL Multimission Image Processing Lab bob.d...@jpl.nasa.gov
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, I found that you are favoring build_int_cst (size_type_node, 0) over size_zero_node. Is there a reason to this? Furthermore did I have to patch this: diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c index 585f25d..f374558 100644 --- a/gcc/fortran/dump-parse-tree.c +++ b/gcc/fortran/dump-parse-tree.c @@ -465,7 +465,7 @@ show_expr (gfc_expr *p) break; case BT_HOLLERITH: - fprintf (dumpfile, "%dH", p->representation.length); + fprintf (dumpfile, "%zdH", p->representation.length); c = p->representation.string; for (i = 0; i < p->representation.length; i++, c++) { to bootstrap on x86_64-linux/f23. And I have this regression: FAIL: gfortran.dg/allocate_deferred_char_scalar_1.f03 -O1 (test for excess errors) allocate_deferred_char_scalar_1.f03:184:0: p = '12345679' Warning: '__builtin_memcpy' writing 8 bytes into a region of size 5 overflows the destination [-Wstringop-overflow=] allocate_deferred_char_scalar_1.f03:242:0: p = 4_'12345679' Warning: '__builtin_memcpy' writing 32 bytes into a region of size 20 overflows the destination [-Wstringop-overflow=] I also tried with a sanitized gfortran and am seeing some issues there. I have to sort through these, but thought to let you know about the above already. Btw, the patch for changing the ABI of the coarray-libs is already nearly done. I just need to figure that what the state of regressions is with and without my change. - Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 12, 2016 at 3:35 PM, Janus Weil wrote: > Hi guys, > >> there is already an ABI change. DTIO needed it. > > maybe it would be a good idea to document this in places like: > * https://gcc.gnu.org/wiki/GFortran/News > * https://gcc.gnu.org/gcc-7/changes.html > > On the first page there are "Compatibility notices" for several > earlier versions which mention stuff like this ... Yes, absolutely. I was planning to do it when/if the patch is accepted and merged. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi guys, > there is already an ABI change. DTIO needed it. maybe it would be a good idea to document this in places like: * https://gcc.gnu.org/wiki/GFortran/News * https://gcc.gnu.org/gcc-7/changes.html On the first page there are "Compatibility notices" for several earlier versions which mention stuff like this ... Cheers, Janus > On Mon, 12 Dec 2016 11:20:06 +0100 > FX wrote: > >> Hi Janne, >> >> This is an ABI change, so it is serious… it will require people to recompile >> older code and libraries with the new compiler. Do we already plan to break >> the ABI in this cycle, or is this the first ABI-breaking patch of the cycle? >> And do we have real-life examples of character strings larger than 2 GB? >> >> > Also, as there are some places in the frontend were negative character >> > lengths are used as special flag values, in the frontend the character >> > length is handled as a signed variable of the same size as a size_t, >> > although in the runtime library it really is size_t. >> >> First, I thought: we should really make it size_t, and have the negative >> values be well-defined constants, e.g. (size_t) -1 >> >> On the other hand, there is the problem of the case where the front-end has >> different size_t than the target: think 32-bit on 64-bit i386 (front-end >> size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit >> machine (front-end size_t smaller than target size_t). So the charlen type >> bounds need to be determined when the front-end runs, not when it is compiled >> (i.e. it is not a fixed type). >> >> In iresolve.c, the "Why is this fixup needed?” comment is kinda scary. >> >> >> > I haven't changed the character length variables for the co-array >> > intrinsics, as this is something that may need to be synchronized with >> > OpenCoarrays. >> >> Won’t that mean that coarray programs will fail due to ABI mismatch? >> >> >> FX > > -- > Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 12, 2016 at 12:29 PM, Andre Vehreschild wrote: > I will take on the coarray ABI changes in the next days and also emit a > pull-request to the opencoarrays to get them to sync. Janne, please wait until > I have added those changes to prevent people from having to re-compile > multiple > times. Ok, thanks for taking care of this! -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
On Mon, Dec 12, 2016 at 12:20 PM, FX wrote: > Hi Janne, > > This is an ABI change, so it is serious… it will require people to recompile > older code and libraries with the new compiler. Do we already plan to break > the ABI in this cycle, or is this the first ABI-breaking patch of the cycle? As Andre mentioned, the ABI has already been broken, Gfortran 7 will have libgfortran.so.4. However, this will also affect people doing C->Fortran calls the old-fashioned way without ISO_C_BINDING, as they will have to change the string length argument from int to size_t in their prototypes. Then again, Intel Fortran did this some years ago so I guess at least people who care about portability to several compilers are aware. > And do we have real-life examples of character strings larger than 2 GB? Well, people who have needed such will have figured out some work-around since we haven't supported it, so how would we know? :) It could be splitting the data into several strings, or switching to ifort, using C instead of Fortran, or something else. In any case, I don't expect characters larger than 2 GB to be common (particularly with the Fortran standard-mandated behaviour of space-padding to the end in many cases), but as the ABI has been broken anyways, we might as well fix it. IIRC at some point there was some discussion of this on comp.lang.fortran, and somebody mentioned analysis of genomic data as a use case where large characters can be useful. I don't have any personal usecase though, at least at the moment. >> Also, as there are some places in the frontend were negative character >> lengths are used as special flag values, in the frontend the character >> length is handled as a signed variable of the same size as a size_t, >> although in the runtime library it really is size_t. > > First, I thought: we should really make it size_t, and have the negative > values be well-defined constants, e.g. (size_t) -1 I tried it, but in addition to the issue with negative characters used as flag values, there's issues like we have stuff such as gfc_get_int_expr() that take a kind value, and an integer constant, and produces a gfc_expr. But that doesn't understand stuff like unsigned types. So in the end I decided it's better to get this patch in working shape and merged with the ABI changes, then one can fix the unsigned-ness later (in the end it's just a factor of two in sizes we can handle, so not a huge deal). > On the other hand, there is the problem of the case where the front-end has > different size_t than the target: think 32-bit on 64-bit i386 (front-end > size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit > machine (front-end size_t smaller than target size_t). So the charlen type > bounds need to be determined when the front-end runs, not when it is compiled > (i.e. it is not a fixed type). True. Although things like gfc_charlen_type_node should be correct for the target, the type gfc_charlen_t that I introduced in the frontend might be too small if one is doing a 32->64 bit cross-compile. So that should be changed from a typedef of ptrdiff_t to a typedef of HOST_WIDE_INT which AFAIK is guaranteed to be 64-bit everywhere. > In iresolve.c, the "Why is this fixup needed?” comment is kinda scary. Hmm, I think it's a leftover from some earlier experimentation, should be removed. >> I haven't changed the character length variables for the co-array >> intrinsics, as this is something that may need to be synchronized with >> OpenCoarrays. > > Won’t that mean that coarray programs will fail due to ABI mismatch? No, the co-array intrinsics are, well, intrinsics, so they're handled specially in the frontend and don't need to follow the normal argument-passing conventions. But I think it'd be easier if they did, and might prevent some obscure corner-case bugs. Say, create a character variable with length 2**31+9, then typecasting to plain int when calling the intrinsic would wrap around and the library would see a negative length. -- Janne Blomqvist
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi FX, there is already an ABI change. DTIO needed it. I will take on the coarray ABI changes in the next days and also emit a pull-request to the opencoarrays to get them to sync. Janne, please wait until I have added those changes to prevent people from having to re-compile multiple times. - Andre On Mon, 12 Dec 2016 11:20:06 +0100 FX wrote: > Hi Janne, > > This is an ABI change, so it is serious… it will require people to recompile > older code and libraries with the new compiler. Do we already plan to break > the ABI in this cycle, or is this the first ABI-breaking patch of the cycle? > And do we have real-life examples of character strings larger than 2 GB? > > > Also, as there are some places in the frontend were negative character > > lengths are used as special flag values, in the frontend the character > > length is handled as a signed variable of the same size as a size_t, > > although in the runtime library it really is size_t. > > First, I thought: we should really make it size_t, and have the negative > values be well-defined constants, e.g. (size_t) -1 > > On the other hand, there is the problem of the case where the front-end has > different size_t than the target: think 32-bit on 64-bit i386 (front-end > size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit > machine (front-end size_t smaller than target size_t). So the charlen type > bounds need to be determined when the front-end runs, not when it is compiled > (i.e. it is not a fixed type). > > In iresolve.c, the "Why is this fixup needed?” comment is kinda scary. > > > > I haven't changed the character length variables for the co-array > > intrinsics, as this is something that may need to be synchronized with > > OpenCoarrays. > > Won’t that mean that coarray programs will fail due to ABI mismatch? > > > FX -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] PR 78534 Change character length from int to size_t
Hi Janne, This is an ABI change, so it is serious… it will require people to recompile older code and libraries with the new compiler. Do we already plan to break the ABI in this cycle, or is this the first ABI-breaking patch of the cycle? And do we have real-life examples of character strings larger than 2 GB? > Also, as there are some places in the frontend were negative character > lengths are used as special flag values, in the frontend the character > length is handled as a signed variable of the same size as a size_t, > although in the runtime library it really is size_t. First, I thought: we should really make it size_t, and have the negative values be well-defined constants, e.g. (size_t) -1 On the other hand, there is the problem of the case where the front-end has different size_t than the target: think 32-bit on 64-bit i386 (front-end size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit machine (front-end size_t smaller than target size_t). So the charlen type bounds need to be determined when the front-end runs, not when it is compiled (i.e. it is not a fixed type). In iresolve.c, the "Why is this fixup needed?” comment is kinda scary. > I haven't changed the character length variables for the co-array > intrinsics, as this is something that may need to be synchronized with > OpenCoarrays. Won’t that mean that coarray programs will fail due to ABI mismatch? FX
[PATCH] PR 78534 Change character length from int to size_t
In order to handle large character lengths on (L)LP64 targets, switch the GFortran character length from an int to a size_t. This is an ABI change, as procedures with character arguments take hidden arguments with the character length. I also changed the _size member in vtables from int to size_t, as there were some cases where character lengths and sizes were apparently mixed up and caused regressions otherwise. Although I haven't tested, this might enable very large derived types as well. Also, as there are some places in the frontend were negative character lengths are used as special flag values, in the frontend the character length is handled as a signed variable of the same size as a size_t, although in the runtime library it really is size_t. I haven't changed the character length variables for the co-array intrinsics, as this is something that may need to be synchronized with OpenCoarrays. A caveat here is the testcase char_result_8.f90, which I changed slightly to work around PR 78757. It's a somewhat obscure corner case, and while this patch admittedly makes it more likely to trigger this bug, it's, well, still a corner case. Regtested on x86_64-pc-linux-gnu, Ok for trunk? frontend: 2016-12-12 Janne Blomqvist PR fortran/78534 PR fortran/66310 * arith.c (gfc_check_charlen_range): New function. (gfc_range_check): Use gfc_check_charlen_range. * class.c (gfc_find_derived_vtab): Use gfc_size_kind instead of hardcoded kind. (find_intrinsic_vtab): Likewise. * expr.c (gfc_get_character_expr): Length parameter of type gfc_charlen_t. (gfc_get_int_expr): Value argument of type long. (gfc_extract_long): New function. * gfortran.h (gfc_typespec): New member is_charlen. (gfc_charlen_t): New typedef. (gfc_expr): Use gfc_charlen_t for character lengths. (gfc_size_kind): New extern variable. (gfc_extract_long): New prototype. (gfc_get_character_expr): Use gfc_charlen_t for character length. (gfc_get_int_expr): Use long type for value argument. * iresolve.c (gfc_resolve_repeat): Use gfc_charlen_t, gfc_charlen_int_kind, set is_charlen. * match.c (select_intrinsic_set_tmp): Use long for charlen. * module.c (atom_int): Change type from int to HOST_WIDE_INT. (parse_integer): Don't complain about large integers. (write_atom): Use HOST_WIDE_INT for integers. (mio_integer): Handle integer type mismatch. (mio_hwi): New function. (mio_intrinsic_op): Use HOST_WIDE_INT. (mio_array_ref): Likewise. (mio_expr): Likewise. * resolve.c (resolve_substring): Use get_type_static_bounds. (resolve_select_type): Use long for charlen. (resolve_charlen): Use long for charlen, get_type_static_bounds. * simplify.c (gfc_simplify_repeat): Likewise. * target-memory.c (gfc_interpret_character): Use gfc_charlen_t. * trans-array.c (get_array_ctor_var_strlen): Use gfc_conv_mpz_to_tree_type. * trans-const.c (gfc_conv_mpz_to_tree_type): New function. * trans-const.h (gfc_conv_mpz_to_tree_type): New prototype. * trans-decl.c (create_function_arglist): Assert that length is not NULL_TREE. * trans-expr.c (gfc_class_len_or_zero_get): Build const of type gfc_charlen_type_node. (gfc_conv_intrinsic_to_class): Use gfc_charlen_int_kind instead of 4, fold_convert to correct type. (gfc_conv_class_to_class): Build const of type size_type_node for size. (gfc_copy_class_to_class): Likewise. (gfc_conv_string_length): Use same type in expression. (gfc_conv_substring): Likewise, use long for charlen. (gfc_conv_string_tmp): Make sure len is of the right type. (gfc_conv_concat_op): Use same type in expression. (gfc_conv_procedure_call): Likewise. (alloc_scalar_allocatable_for_subcomponent_assignment): fold_convert to right type. (gfc_trans_subcomponent_assign): Likewise. (trans_class_vptr_len_assignment): Build const of correct type. (gfc_trans_pointer_assignment): Likewise. (alloc_scalar_allocatable_for_assignment): fold_convert to right type in expr. (trans_class_assignment): Build const of correct type. * trans-intrinsic.c (gfc_conv_associated): Likewise. (gfc_conv_intrinsic_repeat): Do calculation in sizetype. * trans-io.c (gfc_build_io_library_fndecls): Use gfc_charlen_type_node for character lengths. * trans-stmt.c (gfc_trans_label_assign): Build const of gfc_charlen_type_node. (gfc_trans_character_select): Likewise. (gfc_trans_allocate): Likewise, don't typecast strlen result. (gfc_trans_deallocate): Don't typecast strlen result. * trans-types.c (gfc_size_kind): New variable. (gfc_init_types)