https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #8)
> FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
> from the OpenSolaris sources, modified not to do that '?' conversion.
> Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
> more prevalent and would need to be eradicated upstream.

If it is always '?' used instead of unknown character, we could also have some
hack on the libcpp side for it.
Like (but limited to Solaris hosts) in convert_using_iconv when converting from
SOURCE_CHARSET to some other character set don't try to convert the whole UTF-8
string at once, but split it into chunks at u'?' characters, so
foo???bar?baz?qux
would be iconv converted as
foo
???
bar
?
baz
?
qux
chunks.  And when converting the non-? chunks, it would after the conversion
check for the '?' character (in the destination character set - that is
something that perhaps could be queried during initialization after iconv_open)
and treat it as an error if it appeared there.  Or always convert also back to
UTF-8 and check if it has more '?' characters than the source.

Reply via email to