Gabor is right Max! The Unicode standard defines characters in a 32 bit space, The Unicode Character Space in 32 bits or UCS-32.
For practical reasons, the Unicode standard defines transformation formats, i.e.: UTF-8 Unicode transformation format for 8 bits UTF-16 Unicode transformation format for 16 bits [Any transformation format above 8 bits needs to handle byte-ordering issues.] The original Max's question persists... | > but what about unicode characters, that are simply above the 16-bit | > limit? | > | > for example: | > OLD ITALIC LETTER A (unicode code: 10300). | > | > how do you represent those in .net? Cheers! Fabio Montoya | -----Original Message----- | From: [EMAIL PROTECTED] | [mailto:[EMAIL PROTECTED] On Behalf Of max | Sent: Sunday, February 08, 2004 10:04 PM | To: gabor; [EMAIL PROTECTED] | Subject: Re: [Mono-list] unicode trouble | | Hi Gabor, | I think you're confused. Characters in .NET are 16 bits | BECAUSE they are unicode. 16 bits = 2 bytes = 65536 values. | | a way to check that is simple. here's some C# example code: | | string s = "a"; | s += (char)10300; | | Console.WriteLine("s = " + s); | Console.WriteLine("len = " + s.Length); | | for (int i = 0; i < s.Length; i++ ) { | Console.WriteLine("s["+i+"] = " + (int)s[i]); | } | | max | | On Sunday 08 February 2004 15:19, gabor wrote: | > hi, | > | > as i understand, characters in .net are 16-bit values. | > | > but what about unicode characters, that are simply above the 16-bit | > limit? | > | > for example: | > OLD ITALIC LETTER A (unicode code: 10300). | > | > how do you represent those in .net? | > | > i tried to open a textfile containing this old-italic-a: | > | > - the length and indexing methods of string all said that | old-italic-a | > is actually 2 letters => it doesn't work | > - when writing the string back to an utf8 encoded textfile, then it | > was correctly written. | > | > so for me it seems that dotnet (mono) uses utf16 as | internal encoding | > format, but indexing (and length) doesn't use that information. | > | > am i correct? | > | > are there any ways to handle those characters in dotnet? | > | > for example the new java-1.5 contains some new | string-methods that can | > handle these characters. it's not perfect in java, but at | least there | > is something. | > | > if someone wants to play with it, i attached a text file containing | > the text "marrakesh", encoded in utf8, where i replaced the | first "a" | > with old-italic-a (it's easy to do with a little iconv to-from ucs4 | > and hexedit) | > | > thanks, | > gabor farkas | | _______________________________________________ | Mono-list maillist - [EMAIL PROTECTED] | http://lists.ximian.com/mailman/listinfo/mono-list | | _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list