[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #13 from burnus at gcc dot gnu dot org 2008-08-19 06:02 --- Subject: Bug 35863 Author: burnus Date: Tue Aug 19 06:00:51 2008 New Revision: 139223 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139223 Log: 2008-08-19 Tobias Burnus [EMAIL PROTECTED] PR libfortran/35863 * io/write.c (write_a_char4): Add missing variable declaration in HAVE_CRLF block. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/write.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #9 from jvdelisle at gcc dot gnu dot org 2008-08-16 06:11 --- Subject: Bug 35863 Author: jvdelisle Date: Sat Aug 16 03:38:31 2008 New Revision: 139147 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139147 Log: 2008-08-15 Jerry DeLisle [EMAIL PROTECTED] PR libfortran/35863 * intrinsics/selected_char_kind.c: Enable iso_10646. * io/read.c (typedef uchar): New type. (read_utf8): New function to read a single UTF-8 encoded character. (read_utf8_char1): New function to read UTF-8 into a KIND=1 string. (read_default_char1): New functio to read default into KIND=1 string. (read_utf8_char4): New function to read UTF-8 into a KIND=4 string. (read_default_char4): New function to read UTF-8 into a KIND=4 string. (read_a): Modify to use the new functions. (read_a_char4): Modify to use the new functions. * io/write.c (error.h): Add include. (typedef uchar): New type. (write_default_char4): New function to default write KIND=4 string. (write_utf8_char4): New function to UTF-8 write KIND=4 string. (write_a_char4): Modify to use new functions. (write_character): Modify to use new functions. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/intrinsics/selected_char_kind.c trunk/libgfortran/io/read.c trunk/libgfortran/io/write.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #10 from jvdelisle at gcc dot gnu dot org 2008-08-16 06:11 --- Subject: Bug 35863 Author: jvdelisle Date: Sat Aug 16 03:42:54 2008 New Revision: 139148 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139148 Log: 2008-08-15 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * gfortran.dg/utf8_1.f03: New test. * gfortran.dg/utf8_2.f03: New test. Added: trunk/gcc/testsuite/gfortran.dg/utf8_1.f03 trunk/gcc/testsuite/gfortran.dg/utf8_2.f03 Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #10 from jvdelisle at gcc dot gnu dot org 2008-08-16 06:11 --- Subject: Bug 35863 Author: jvdelisle Date: Sat Aug 16 03:42:54 2008 New Revision: 139148 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139148 Log: 2008-08-15 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * gfortran.dg/utf8_1.f03: New test. * gfortran.dg/utf8_2.f03: New test. Added: trunk/gcc/testsuite/gfortran.dg/utf8_1.f03 trunk/gcc/testsuite/gfortran.dg/utf8_2.f03 Modified: trunk/gcc/testsuite/ChangeLog --- Comment #11 from jvdelisle at gcc dot gnu dot org 2008-08-16 06:11 --- Subject: Bug 35863 Author: jvdelisle Date: Sat Aug 16 03:36:32 2008 New Revision: 139146 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139146 Log: 2008-08-15 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * io.c (gfc_match_open): Enable UTF-8 in checks. * simplify.c (gfc_simplify_selected_char_kind): Enable iso_10646. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/io.c trunk/gcc/fortran/simplify.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #6 from jvdelisle at gcc dot gnu dot org 2008-06-13 20:28 --- Subject: Bug 35863 Author: jvdelisle Date: Fri Jun 13 20:28:08 2008 New Revision: 136763 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136763 Log: 2008-06-13 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * libgfortran.h: Change l8_to_l4_offset to big_endian and add endian_off. * runtime/main.c: Fix error in comment. Change l8_to_l4_offset to big_endian. (determine_endianness): Add endian_off and set its value according to big_endian. * gfortran.map: Add symbol for new _gfortran_transfer_character_wide. * io/io.h: Add prototype declarations for new functions. * io/list_read.c (list_formatted_read_scalar): Modify to handle kind=4. (list_formatted_read): Calculate stride based on kind for character type and use it when calling list_formatted_read_scalar. * io/inquire.c (inquire_via_unit): Change l8_to_l4_offset to big_endian. * io/open.c (st_open): Change l8_to_l4_offset to big_endian. * io/read.c (read_a_char4): New function to handle formatted read. * io/write.c: Define GFC_CHAR4(x) to improve readability of code. (write_a_char4): New function to handle formatted write. (write_character): Modify to accept the kind parameter and adjust for endianess of the machine. (list_formatted_write): Calculate the stride resulting from the kind and adjust the list_formatted_write_scalar call accordingly. (nml_write_obj): Adjust calls to write_character. (namelist_write): Likewise. * io/transfer.c (formatted_transfer_scaler): Rename 'len' argument to 'kind' argument to better describe what it is. Add calls to new functions for kind == 4. (formatted_transfer): Modify to handle the case of type character and kind equals 4 to pass in the kind to the transfer routines. (transfer_character_wide): Add this new function. (transfer_array): Don't set kind to the character string length. Adjust strides bases on character kind. (unformatted_read): Adjust size based on kind for character types. (unformatted_write): Likewise. (data_transfer_init): Change l8_to_l4_offset to big_endian. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/gfortran.map trunk/libgfortran/io/fbuf.c trunk/libgfortran/io/inquire.c trunk/libgfortran/io/io.h trunk/libgfortran/io/list_read.c trunk/libgfortran/io/open.c trunk/libgfortran/io/read.c trunk/libgfortran/io/transfer.c trunk/libgfortran/io/write.c trunk/libgfortran/libgfortran.h trunk/libgfortran/runtime/main.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #7 from jvdelisle at gcc dot gnu dot org 2008-06-13 20:31 --- Subject: Bug 35863 Author: jvdelisle Date: Fri Jun 13 20:30:48 2008 New Revision: 136764 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136764 Log: 2008-06-13 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * trans-io.c (gfc_build_io_library_fndecls): Build declaration for transfer_character_wide which includes passing in the character kind to support wide character IO. (transfer_expr): If the kind == 4, create the argument and build the call. * gfortran.texi: Fix typo. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/gfortran.texi trunk/gcc/fortran/trans-io.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #8 from jvdelisle at gcc dot gnu dot org 2008-06-13 20:35 --- Subject: Bug 35863 Author: jvdelisle Date: Fri Jun 13 20:35:12 2008 New Revision: 136766 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136766 Log: 2008-06-13 Jerry DeLisle [EMAIL PROTECTED] PR fortran/35863 * gfortran.dg/widechar_IO_1.f90: New test. * gfortran.dg/widechar_IO_2.f90: New test. * gfortran.dg/widechar_IO_3.f90: New test. * gfortran.dg/widechar_IO_4.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/widechar_IO_1.f90 trunk/gcc/testsuite/gfortran.dg/widechar_IO_2.f90 trunk/gcc/testsuite/gfortran.dg/widechar_IO_3.f90 trunk/gcc/testsuite/gfortran.dg/widechar_IO_4.f90 Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #5 from jvdelisle at gcc dot gnu dot org 2008-06-07 20:18 --- Working on this now. -- jvdelisle at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |jvdelisle at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2008-04-14 18:55:43 |2008-06-07 20:18:41 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #2 from fxcoudert at gcc dot gnu dot org 2008-04-15 10:45 --- (In reply to comment #0) Front end and library are ready to handle this when implemented. Front-end is ready? Is ENCODING=UTF-8 related to UCS-4 support? Because if it is, then the front-end is not ready, it only supports a single character kind. (In reply to comment #1) This could be a bit tricky to get right. OTOH Fortran is fortunate enough that there are real strings and not char arrays like in C, so from a user perspective it should be pretty transparent. Well, I'm not too sure it's hard. We are not required to support UTF-8 strings as a character kind (that would be really hard) but just UCS-4 strings (ie UTF-32), which is basically (as I see it): - remove limitations in the front-end that there is only one character kind] - make a new character kind, as an array of 32-bit integers and a length - adjust library functions Then, I/O with UTF-8 encoding just needs UTF-8 -- UTF-32 conversions, which is only a few dozen lines of code (unless I'm confused). -- fxcoudert at gcc dot gnu dot org changed: What|Removed |Added CC||fxcoudert at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #3 from burnus at gcc dot gnu dot org 2008-04-15 19:46 --- Front end and library are ready to handle this when implemented. Front-end is ready? Yes, it is: ENCODING= is supported and the rest is neither in the library nor in the front-end implemented. Though I would not call this ready. Is ENCODING=UTF-8 related to UCS-4 support? I think it is at the end. You can easily use UTF-8 encoding already now, but '(a2)' might print one (non-ascii) or two (ascii) characters. To have something well-defined, only one-byte-wide characters can be used currently. For anything beyond, UCS4 is needed in the front end. Actually, I do not understand how to write things like character(kind=myUCS4,len=20) :: foo = myUCS4_'Some UCS4 string' (The problem is switching the encoding within the same file; good luck in finding an editor which supports this.) If one does not need non-ascii character literals (i.e. reading from / writing to files), there is no problem. Possible solutions? a) Have a UCS-4 input file; then both default_'foo' and ucs4_'foo' work. b) Expect that for myUCS4_'foo' literals the characters in the quotes are actually UTF-8. I'm personally in favour of (b). I'm not quite sure whether this is really compatible with the Fortran standard, but I like the way of inputting the string. Otherwise, I think Fortran misses a good way of inputting non-ascii characters in an ASCII file. C99 offers '\u' but unless I missed something in Fortran the equivalent would be: I think (c) is what most programmers want, but I actually do not see how this should work syntax wise; or should an ascii literal automatically handled as UTF-8? Then it would work: when assigning to a ucs8 string, the UTF-8 gets properly converted a non-ascii character has then the length one (len(char() while if one assigns to a ASCII string, non-ascii characters of cause need more bytes and thus len('ยง') == 2. (b) is also an interesting problem. And (a) of cause works, but it is quite cumbersome to use - Fortran misses the \u way of C for specifying an unicode character; one can probably work with myUCS4string = char(int(z/A0FF/),kind=myUCS4) but this is awful. (Actually, I think the standard does not even guarantee that it does this as char is processor dependent.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #4 from fxcoudert at gcc dot gnu dot org 2008-04-15 20:53 --- (In reply to comment #3) Actually, I do not understand how to write things like character(kind=myUCS4,len=20) :: foo = myUCS4_'Some UCS4 string' Ah, I'm glad that I'm not alone! I was thinking of asking advice on c.l.f when I get some time to write. I agree with you that it is not clear at all. (The problem is switching the encoding within the same file; good luck in finding an editor which supports this.) I don't think there is such thing as a file with multiple encodings, and we shouldn't create such a beast just for Fortran. a) Have a UCS-4 input file; then both default_'foo' and ucs4_'foo' work. I'd suggest going for that. b) Expect that for myUCS4_'foo' literals the characters in the quotes are actually UTF-8. See above, I don't think we want to mix encodings. But, we can support both (a) and (b): if the file is UCS4, go for (a), if the file is UTF-8, go for (b). On a personal note, I would use (b) more than (a): UTF-8 is the way forward, and fixed-width encodings are a real pain for file representation (which is different than internal representation). Otherwise, I think Fortran misses a good way of inputting non-ascii characters in an ASCII file. C99 offers '\u' We already have -fbackslash, I can see us accepting that kind of code with a given option; it would really be useful. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863
[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8
--- Comment #1 from jb at gcc dot gnu dot org 2008-04-14 18:55 --- Confirmed. This could be a bit tricky to get right. OTOH Fortran is fortunate enough that there are real strings and not char arrays like in C, so from a user perspective it should be pretty transparent. But certainly the implementation can be tricky. Perhaps we should ask advice from e.g. python developers who already have implemented unicode support in some language with a runtime library written in C? http://www.cl.cam.ac.uk/~mgk25/unicode.html Specifically http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod http://www.cl.cam.ac.uk/~mgk25/unicode.html#c -- jb at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2008-04-14 18:55:43 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863