Abdelrazak Younes wrote:
> Peter Kümmel wrote:
>> Peter Kümmel wrote:
>>
>>> for values which are not surrogates "if (ch >= UNI_SUR_HIGH_START &&
>>> ch <= UNI_SUR_LOW_END)" (2047 values)
>>
>> read: only 2047 of the 65535 values are not allowed, and for the rest
>> a cast transforms from utf32 to utf16.
>
> I think QChar will automatically replace those with interrogation marks
> anyway.
>
> But I could also check for these values explicitely in my conversion
> routine and return this '?' characters for those unknown characters:
>
> char_type const UNI_SUR_HIGH_START 0xD800;
> char_type const UNI_SUR_LOW_END 0xDFFF;
>
> QChar const UnknownChar(...);
>
> QChar const ucs4_to_qchar(char_type const & ucs4)
> {
> if (ucs4 >= 0xFFFE
> || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <= UNI_SUR_LOW_END)
> return UnknownChar;
>
> return QChar(static_cast<unsigned short>(ucs4));
> }
>
> Abdel.
>
>
Could we not replace the current implementation of
unsigned short ucs4_to_ucs2(boost::uint32_t c)
with such a inline implementation, because iconv must
in principle do the same.
char_type const UNI_REPLACEMENT_CHAR 0x0000FFFD
char_type const UNI_SUR_HIGH_START 0xD800;
char_type const UNI_SUR_LOW_END 0xDFFF;
unsigned short ucs4_to_ucs2(boost::uint32_t ucs4)
{
if (ucs4 >= 0xFFFE || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <=
UNI_SUR_LOW_END))
return UnknownChar;
return static_cast<unsigned short>(ucs4);
}
compare with
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
Peter