Issue7+TC2 0001635]: iconv: please be more explicit in input-not-convertible case

Austin Group Bug Tracker via austin-group-l at The Open Group Mon, 20 Feb 2023 16:16:11 -0800


The following issue has been SUBMITTED. 
====================================================================== 
https://austingroupbugs.net/view.php?id=1635 
====================================================================== 
Reported By:                steffen
Assigned To:                
====================================================================== 
Project:                    1003.1(2016/18)/Issue7+TC2
Issue ID:                   1635
Category:                   Base Definitions and Headers
Type:                       Clarification Requested
Severity:                   Editorial
Priority:                   normal
Status:                     New
Name:                       steffen 
Organization:                
User Reference:              
Section:                    iconv 
Page Number:                1123 
Line Number:                38014 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2023-02-21 00:14 UTC
Last Modified:              2023-02-21 00:14 UTC
====================================================================== 
Summary:                    iconv: please be more explicit in
input-not-convertible case
Description: 
issue 1007 resolves this to


    If iconv() encounters a character in the input buffer that is valid,
but for which an identical character does not exist in the output codeset:

        If either the //IGNORE or the //NON_IDENTICAL_DISCARD indicator
suffix was specified when the conversion descriptor cd was opened, the
character shall be discarded but shall still be counted in the return value
of the iconv() call.

        If the //TRANSLIT indicator suffix was specified when the
conversion descriptor cd was opened, an implementation-defined
transliteration shall be performed, if possible, to convert the character
into one or more characters of the output codeset that best resemble the
input character. The character shall be counted as one character in the
return value of the iconv() call, regardless of the number of output
characters.

        If no indicator suffix was specified when the conversion descriptor
cd was opened, or the //TRANSLIT indicator suffix was specified but no
transliteration of the character is possible, iconv() shall perform an
implementation-defined conversion on the character and it shall be counted
in the return value of the iconv() call.

However, as Martin Sebor stated in the issue description,

        The specification for the iconv() function assumes that every input
sequence that is valid in the source codeset is convertible to some
sequence in the destination codeset. In particular, the specification
doesn't allow the function to fail when a valid sequence in the source
codeset cannot be represented in the destination codeset. As an example
where this assumption doesn't hold, consider a conversion from UTF-8 to
ISO-8859 where a large number of source characters don't have equivalents
in the destination codeset.

        A survey of a subset of existing implementations shows that they fail
with EILSEQ in such cases, despite the specification defining the error
condition as "Input conversion stopped due to an input byte that does not
belong to the input codeset." 

And this is true, GNU C library and GNU libiconv seem to fail output
conversion immediately with the same EILSEQ error that denotes invalid
input data.
(A much more drastic error, .. is it!?!)
Desired Action: 
Please be more explicit and denote that implementations exist which behave
like GNU C-lib iconv / libiconv.
That is to say that "implementation defined conversion" may mean no
conversion at all, but an immediate stop.

It would be tremendous if the standard could define hands that programmers
can react upon, because, due to restriction of the iconv interface, it is
impossible to decide what the error was.
A programmer does know nothing of input nor output character set, how many
bytes may make up a character, how many were consumed / produced, whether
conversion replacements where stored, or not.  (In practice all others
known to me do place some character and continue.)

This refers to GNU library bug report

  https://sourceware.org/bugzilla/show_bug.cgi?id=29913

where the honourable author of GNU iconv, and YES!, the GNU approach has
lots of merits!, but it should be possible to differentiate in between the
errors,

  Better even would be an explicit //CONVERR-STOP-WITH-ENODATA modifier.

refers to gnulib source files where the same approach is implemented
portably, it seems, and the cost is tremendous, because of all the
shortcomings of the iconv interface!
Like approaching cautiously byte-by-byte until a conversion succeeds!

      for (insize = 1; inptr + insize <= inptr_end; insize++)
        {
          res = iconv (cd,
                       (ICONV_CONST char **) &inptr, &insize,
                       &outptr, &outsize);
          if (!(res == (size_t)(-1) && errno == EINVAL))
            break;
          /* iconv can eat up a shift sequence but give EINVAL while
attempting
             to convert the first character.  E.g. libiconv does this.  */
          if (inptr > inptr_before)
            {
              res = 0;
              break;
            }
        }

This is ridiculous!
====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2023-02-21 00:14 steffen        New Issue                                    
2023-02-21 00:14 steffen        Name                      => steffen         
2023-02-21 00:14 steffen        Section                   => iconv           
2023-02-21 00:14 steffen        Page Number               => 1123            
2023-02-21 00:14 steffen        Line Number               => 38014           
======================================================================

[1003.1(2016/18)/Issue7+TC2 0001635]: iconv: please be more explicit in input-not-convertible case

Reply via email to