On Thursday 09 March 2006 14:14, David Bertoni wrote:
> Steven T. Hatton wrote:

> I guess I don't understand what you mean by "I believe an individual
> 16-bit XMLCh will occupy 32-bits of storage."  How can a 16-bit XMLCh
> ever occupy 32 bits of storage?

What is the CPU going to stick in the other 16 bits of a 32 bit word when it 
stores a single XMLCh?

> I agree it's a big problem that you cannot use it with
> std::basic_string, but there's no reason why you can't use it with the
> the other containers.  What other facilities do you want to use?

Well, I'm still learning the Standard Library, so I don't really know what I 
can get of the std::basic_string.  I know it has a bunch of seaching and 
manipulation functions.  In all likelyhood, I will end up using QString for 
my UI.  I'm working on a C++ project management infrastructure, and felt 
somewhat compromised by having to rely on Qt.  Not that I have anything 
against Qt.  It think it's fantastic.  I just wanted to build the basics of 
the program using Standard C++.
 
> UTF-16 is an encoding of the 10646/Unicode character set, and you've
>
> stated previously that the C++ standard does not talk about encodings:
>  > The C++ Standard only specifies character sets.  It does not specify
>  > encodings.
>
> There is no requirement that a character specified with a universal
> character name be encoded in any particular way -- it's just another way
> to name a character.

There's an isomorphism in there somewhere which, in principle, could be 
leveraged to bridge between the encodings.  I'm not saying it would be worth 
doing.

> My version of the standard also has this to say:
>
> "If the hexadecimal value for a universal character name is less than
> 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal
> character name designates a character in the basic source character set,
> then the program is ill-formed."
>
> That restricts the usage of universal character names too severely for
> Xerces-C's purposes.

I am under the impression that the stipulation you quoted only applies to 
character literals. AFAIK Xerces-C doesn't support character literal of any 
kind.  Correct?

What I really want to know is whether there is significant cost associated 
with using UTF-16 with support for character sets outside of the BMP.  In 
some operations that would require the program to sniff every character to 
detect if it is multi-unit.  From thingking through scenarios, it seem likely 
that you could get away with ignoring that aspect of the encoding.

Steven 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to