[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-08-19 Thread burnus at gcc dot gnu dot org


--- Comment #13 from burnus at gcc dot gnu dot org  2008-08-19 06:02 ---
Subject: Bug 35863

Author: burnus
Date: Tue Aug 19 06:00:51 2008
New Revision: 139223

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139223
Log:
2008-08-19  Tobias Burnus  [EMAIL PROTECTED]

   PR libfortran/35863
   * io/write.c (write_a_char4): Add missing variable declaration
   in HAVE_CRLF block.


Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/write.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-08-16 Thread jvdelisle at gcc dot gnu dot org


--- Comment #9 from jvdelisle at gcc dot gnu dot org  2008-08-16 06:11 
---
Subject: Bug 35863

Author: jvdelisle
Date: Sat Aug 16 03:38:31 2008
New Revision: 139147

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139147
Log:
2008-08-15  Jerry DeLisle  [EMAIL PROTECTED]

PR libfortran/35863
* intrinsics/selected_char_kind.c: Enable iso_10646.
* io/read.c (typedef uchar): New type.
(read_utf8): New function to read a single UTF-8 encoded character.
(read_utf8_char1): New function to read UTF-8 into a KIND=1 string.
(read_default_char1): New functio to read default into KIND=1 string.
(read_utf8_char4): New function to read UTF-8 into a KIND=4 string.
(read_default_char4): New function to read UTF-8 into a KIND=4 string.
(read_a): Modify to use the new functions.
(read_a_char4): Modify to use the new functions.
* io/write.c (error.h): Add include. (typedef uchar): New type.
(write_default_char4): New function to default write KIND=4 string.
(write_utf8_char4): New function to UTF-8 write KIND=4 string.
(write_a_char4): Modify to use new functions.
(write_character): Modify to use new functions.

Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/intrinsics/selected_char_kind.c
trunk/libgfortran/io/read.c
trunk/libgfortran/io/write.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-08-16 Thread jvdelisle at gcc dot gnu dot org


--- Comment #10 from jvdelisle at gcc dot gnu dot org  2008-08-16 06:11 
---
Subject: Bug 35863

Author: jvdelisle
Date: Sat Aug 16 03:42:54 2008
New Revision: 139148

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139148
Log:
2008-08-15  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* gfortran.dg/utf8_1.f03: New test.
* gfortran.dg/utf8_2.f03: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/utf8_1.f03
trunk/gcc/testsuite/gfortran.dg/utf8_2.f03
Modified:
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-08-16 Thread jvdelisle at gcc dot gnu dot org


--- Comment #10 from jvdelisle at gcc dot gnu dot org  2008-08-16 06:11 
---
Subject: Bug 35863

Author: jvdelisle
Date: Sat Aug 16 03:42:54 2008
New Revision: 139148

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139148
Log:
2008-08-15  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* gfortran.dg/utf8_1.f03: New test.
* gfortran.dg/utf8_2.f03: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/utf8_1.f03
trunk/gcc/testsuite/gfortran.dg/utf8_2.f03
Modified:
trunk/gcc/testsuite/ChangeLog


--- Comment #11 from jvdelisle at gcc dot gnu dot org  2008-08-16 06:11 
---
Subject: Bug 35863

Author: jvdelisle
Date: Sat Aug 16 03:36:32 2008
New Revision: 139146

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139146
Log:
2008-08-15  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* io.c (gfc_match_open): Enable UTF-8 in checks.
* simplify.c (gfc_simplify_selected_char_kind): Enable iso_10646.

Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/io.c
trunk/gcc/fortran/simplify.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-06-13 Thread jvdelisle at gcc dot gnu dot org


--- Comment #6 from jvdelisle at gcc dot gnu dot org  2008-06-13 20:28 
---
Subject: Bug 35863

Author: jvdelisle
Date: Fri Jun 13 20:28:08 2008
New Revision: 136763

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136763
Log:
2008-06-13  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* libgfortran.h: Change l8_to_l4_offset to big_endian and add
endian_off.
* runtime/main.c: Fix error in comment. Change l8_to_l4_offset to
big_endian. (determine_endianness): Add endian_off and set its value
according to big_endian.
* gfortran.map: Add symbol for new _gfortran_transfer_character_wide.
* io/io.h: Add prototype declarations for new functions.
* io/list_read.c (list_formatted_read_scalar): Modify to handle kind=4.
(list_formatted_read): Calculate stride based on kind for character
type
and use it when calling list_formatted_read_scalar.
* io/inquire.c (inquire_via_unit): Change l8_to_l4_offset to
big_endian.
* io/open.c (st_open): Change l8_to_l4_offset to big_endian.
* io/read.c (read_a_char4): New function to handle formatted read.
* io/write.c: Define GFC_CHAR4(x) to improve readability of code.
(write_a_char4): New function to handle formatted write.
(write_character): Modify to accept the kind parameter and adjust for
endianess of the machine. (list_formatted_write): Calculate the stride
resulting from the kind and adjust the list_formatted_write_scalar call
accordingly. (nml_write_obj): Adjust calls to write_character.
(namelist_write): Likewise.
* io/transfer.c (formatted_transfer_scaler): Rename 'len' argument to
'kind' argument to better describe what it is. Add calls to new
functions for kind == 4. (formatted_transfer): Modify to handle the
case
of type character and kind equals 4 to pass in the kind to the transfer
routines. (transfer_character_wide): Add this new function.
(transfer_array): Don't set kind to the character string length. Adjust
strides bases on character kind.
(unformatted_read): Adjust size based on kind for character types.
(unformatted_write): Likewise. (data_transfer_init): Change
l8_to_l4_offset to big_endian. 

Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/gfortran.map
trunk/libgfortran/io/fbuf.c
trunk/libgfortran/io/inquire.c
trunk/libgfortran/io/io.h
trunk/libgfortran/io/list_read.c
trunk/libgfortran/io/open.c
trunk/libgfortran/io/read.c
trunk/libgfortran/io/transfer.c
trunk/libgfortran/io/write.c
trunk/libgfortran/libgfortran.h
trunk/libgfortran/runtime/main.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-06-13 Thread jvdelisle at gcc dot gnu dot org


--- Comment #7 from jvdelisle at gcc dot gnu dot org  2008-06-13 20:31 
---
Subject: Bug 35863

Author: jvdelisle
Date: Fri Jun 13 20:30:48 2008
New Revision: 136764

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136764
Log:
2008-06-13  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* trans-io.c (gfc_build_io_library_fndecls): Build declaration for
transfer_character_wide which includes passing in the character kind to
support wide character IO. (transfer_expr): If the kind == 4, create
the
argument and build the call.
* gfortran.texi: Fix typo.

Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/gfortran.texi
trunk/gcc/fortran/trans-io.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-06-13 Thread jvdelisle at gcc dot gnu dot org


--- Comment #8 from jvdelisle at gcc dot gnu dot org  2008-06-13 20:35 
---
Subject: Bug 35863

Author: jvdelisle
Date: Fri Jun 13 20:35:12 2008
New Revision: 136766

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=136766
Log:
2008-06-13  Jerry DeLisle  [EMAIL PROTECTED]

PR fortran/35863
* gfortran.dg/widechar_IO_1.f90: New test.
* gfortran.dg/widechar_IO_2.f90: New test.
* gfortran.dg/widechar_IO_3.f90: New test.
* gfortran.dg/widechar_IO_4.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/widechar_IO_1.f90
trunk/gcc/testsuite/gfortran.dg/widechar_IO_2.f90
trunk/gcc/testsuite/gfortran.dg/widechar_IO_3.f90
trunk/gcc/testsuite/gfortran.dg/widechar_IO_4.f90
Modified:
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-06-07 Thread jvdelisle at gcc dot gnu dot org


--- Comment #5 from jvdelisle at gcc dot gnu dot org  2008-06-07 20:18 
---
Working on this now.


-- 

jvdelisle at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |jvdelisle at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2008-04-14 18:55:43 |2008-06-07 20:18:41
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-04-15 Thread fxcoudert at gcc dot gnu dot org


--- Comment #2 from fxcoudert at gcc dot gnu dot org  2008-04-15 10:45 
---
(In reply to comment #0)
 Front end and library are ready to handle this when implemented.

Front-end is ready? Is ENCODING=UTF-8 related to UCS-4 support? Because if it
is, then the front-end is not ready, it only supports a single character kind.

(In reply to comment #1)
 This could be a bit tricky to get right. OTOH Fortran is fortunate enough that
 there are real strings and not char arrays like in C, so from a user
 perspective it should be pretty transparent.

Well, I'm not too sure it's hard. We are not required to support UTF-8 strings
as a character kind (that would be really hard) but just UCS-4 strings (ie
UTF-32), which is basically (as I see it):
  - remove limitations in the front-end that there is only one character kind]
  - make a new character kind, as an array of 32-bit integers and a length
  - adjust library functions

Then, I/O with UTF-8 encoding just needs UTF-8 -- UTF-32 conversions, which
is only a few dozen lines of code (unless I'm confused).


-- 

fxcoudert at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||fxcoudert at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-04-15 Thread burnus at gcc dot gnu dot org


--- Comment #3 from burnus at gcc dot gnu dot org  2008-04-15 19:46 ---
  Front end and library are ready to handle this when implemented.
 Front-end is ready?
Yes, it is: ENCODING= is supported and the rest is neither in the library nor
in the front-end implemented. Though I would not call this ready.

 Is ENCODING=UTF-8 related to UCS-4 support?

I think it is at the end. You can easily use UTF-8 encoding already now, but
'(a2)' might print one (non-ascii) or two (ascii) characters. To have something
well-defined, only one-byte-wide characters can be used currently. For anything
beyond, UCS4 is needed in the front end.

Actually, I do not understand how to write things like 

   character(kind=myUCS4,len=20) :: foo = myUCS4_'Some UCS4 string'

(The problem is switching the encoding within the same file; good luck in
finding an editor which supports this.)

If one does not need non-ascii character literals (i.e. reading from / writing
to files), there is no problem.

Possible solutions?
a) Have a UCS-4 input file; then both default_'foo' and ucs4_'foo' work.
b) Expect that for myUCS4_'foo' literals the characters in the quotes are
actually UTF-8.

I'm personally in favour of (b). I'm not quite sure whether this is really
compatible with the Fortran standard, but I like the way of inputting the
string.

Otherwise, I think Fortran misses a good way of inputting non-ascii characters
in an ASCII file. C99 offers '\u' but unless I missed something in Fortran
the equivalent would be:

I think (c) is what most programmers want, but I actually do not see how this
should work syntax wise; or should an ascii literal automatically handled as
UTF-8? Then it would work: when assigning to a ucs8 string, the UTF-8 gets
properly converted a non-ascii character has then the length one (len(char()
while if one assigns to a ASCII string, non-ascii characters of cause need more
bytes and thus len('ยง') == 2.

(b) is also an interesting problem. And (a) of cause works, but it is quite
cumbersome to use - Fortran misses the \u way of C for specifying an
unicode character; one can probably work with
   myUCS4string = char(int(z/A0FF/),kind=myUCS4)
but this is awful. (Actually, I think the standard does not even guarantee that
it does this as char is processor dependent.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-04-15 Thread fxcoudert at gcc dot gnu dot org


--- Comment #4 from fxcoudert at gcc dot gnu dot org  2008-04-15 20:53 
---
(In reply to comment #3)
 Actually, I do not understand how to write things like 
character(kind=myUCS4,len=20) :: foo = myUCS4_'Some UCS4 string'

Ah, I'm glad that I'm not alone! I was thinking of asking advice on c.l.f when
I get some time to write. I agree with you that it is not clear at all.

 (The problem is switching the encoding within the same file; good luck in
 finding an editor which supports this.)

I don't think there is such thing as a file with multiple encodings, and we
shouldn't create such a beast just for Fortran.

 a) Have a UCS-4 input file; then both default_'foo' and ucs4_'foo' work.

I'd suggest going for that.

 b) Expect that for myUCS4_'foo' literals the characters in the quotes are
 actually UTF-8.

See above, I don't think we want to mix encodings. But, we can support both (a)
and (b): if the file is UCS4, go for (a), if the file is UTF-8, go for (b).

On a personal note, I would use (b) more than (a): UTF-8 is the way forward,
and fixed-width encodings are a real pain for file representation (which is
different than internal representation).

 Otherwise, I think Fortran misses a good way of inputting non-ascii characters
 in an ASCII file. C99 offers '\u'

We already have -fbackslash, I can see us accepting that kind of code with a
given option; it would really be useful.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863



[Bug libfortran/35863] [F2003] Implement ENCODING=UTF-8

2008-04-14 Thread jb at gcc dot gnu dot org


--- Comment #1 from jb at gcc dot gnu dot org  2008-04-14 18:55 ---
Confirmed.

This could be a bit tricky to get right. OTOH Fortran is fortunate enough that
there are real strings and not char arrays like in C, so from a user
perspective it should be pretty transparent. But certainly the implementation
can be tricky. Perhaps we should ask advice from e.g. python developers who
already have implemented unicode support in some language with a runtime
library written in C?

http://www.cl.cam.ac.uk/~mgk25/unicode.html

Specifically

http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod

http://www.cl.cam.ac.uk/~mgk25/unicode.html#c


-- 

jb at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2008-04-14 18:55:43
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863