std::codecvt has in and out methods.  These are used to convert from one
codeset to another.  They return a status:

  std::codecvt_base::ok || std::codecvt_base::partial
    on success or success where the conversion was only partially done

  std::codecvt_base::noconv
     if no conversion is necessary

  std::codecvt_base::error
     on error (e.g. a character could not be converted because it is an invalid
byte sequence).

When doing a conversion between char and wchar_t in UTF-8 and Latin-1 locales,
this appears to behave correctly.  But, when run in a C locale, and e.g. UTF-8
characters are in the input (invalid US-ASCII), it does not return an error, it
returns partial or ok, but the pointers to the next character are not updated,
leading to an infinite loop because the task is not completed.

A testcase is attached.  Try running in a UTF-8 locale, then run in a C locale
to compare (or comment out the first line of main).  Next, remove the UTF-8
chars from the string "foo" in main, and repeat (this works correctly in both
UTF-8 and C locales).

I think in this case codecvt is failing to correctly report an error when given
invalid input.


Regards,
Roger


-- 
           Summary: codecvt causes infinite loop in C locale by not
                    returning an error status on failure
           Product: gcc
           Version: 4.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rleigh at debian dot org
 GCC build triplet: powerpc-linux-gnu
  GCC host triplet: powerpc-linux-gnu
GCC target triplet: powerpc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28155

Reply via email to