Byte Order Marks
Hi, A quick question relating to the Byte Order Mark of UCS-2. If its absent is it safe to assume any particular order (i.e. Big or Little Endian?). I am writing a function to rearrange from Big to little endian but without a byte order mark I'm not sure what the order is. Is there any specification I could refer to? Thanks. Tom Tomas McGuinness Consultant > -- > > University Technology Park* +353 21 4933 277 > Curraheen Rd, Cork *+353 21 4933 201 > * [EMAIL PROTECTED] > -- > > CMG Telecom Products Division > Product Development, Cork > -- > > > >
gb2312
Hi, Is the character set gb2312 encoded in a two octet scheme? If so does it pad out its ascii characters to two octets e.g. the character < is 0x3C in ascii so does it become 0x003C in gb2312? Regrards, Tom. Tomas McGuinness Consultant > -- > > University Technology Park* +353 21 4933 277 > Curraheen Rd, Cork *+353 21 4933 201 > * [EMAIL PROTECTED] > -- > > CMG Telecom Products Division > Product Development, Cork > -- > > > >
Byte Order Marks
Hi, When looking at a document would it be safe to assume that if you found any of the following Byte Order Marks * 0xFFFE (UCS-2 Little Endian) * 0xFEFE (UCS-2 Big Endian) * 0xEFBBBF (UTF-8) That the document is encoded with that encoding format. That means that if I found the first 3 octets to be EF BB EF could I assume I am dealing with a UTF-8 Document. Apart from UTF and Unicode/UCS encoding formats do any other "legacy" character sets use Byte Order Marks? Regrads, Tom. Tomas McGuinness Consultant > -- > > University Technology Park* +353 21 4933 277 > Curraheen Rd, Cork *+353 21 4933 201 > * [EMAIL PROTECTED] > -- > > CMG Telecom Products Division > Product Development, Cork > -- > > > >
RE: Code charts
Hi all, Could any one tell me if this is correct. Is the UCS-2 hex representation of the US-ASCII character < is 0x003C ?? This character does not seem to be present in GB2312-80 or am I wrong. Tomás -Original Message- From: Tomas McGuinness [mailto:[EMAIL PROTECTED]] Sent: 09 April 2001 10:49 To: '[EMAIL PROTECTED]' Subject: Code charts Hi all, I have a question relating to UCS-2. I am working on a project that involves converting WML and HTML documents from a character set to UCS-2. The problem is that the UCS-2 hex representation for say 0x003C (<) is not present in GB2312 [the same glypg I mean]. Its not in the mapping table chart I have anyway. Does the Simplified Chinese character set have this character or is my mapping table incorrect? Could anyone tell me if its possible to download these code mapping charts from the internet. Thanks in advance, Tomás McGuinness Tomas McGuinness Consultant > -- > > University Technology Park* +353 21 4933 277 > Curraheen Rd, Cork *+353 21 4933 201 > * [EMAIL PROTECTED] > -- > > CMG Telecom Products Division > Product Development, Cork > -- > > > >
Code charts
Hi all, I have a question relating to UCS-2. I am working on a project that involves converting WML and HTML documents from a character set to UCS-2. The problem is that the UCS-2 hex representation for say 0x003C (<) is not present in GB2312 [the same glypg I mean]. Its not in the mapping table chart I have anyway. Does the Simplified Chinese character set have this character or is my mapping table incorrect? Could anyone tell me if its possible to download these code mapping charts from the internet. Thanks in advance, Tomás McGuinness Tomas McGuinness Consultant > -- > > University Technology Park* +353 21 4933 277 > Curraheen Rd, Cork *+353 21 4933 201 > * [EMAIL PROTECTED] > -- > > CMG Telecom Products Division > Product Development, Cork > -- > > > >
[unicode] UCS-2 Files
Hi, I have a question relating to UCS-2. I am currently developing a product that will support UCS-2 and I have been sent several documents encoded in UCS-2. I have no reader or writer for UCS-2 but I have performed Hexdumps in UNIX. At the beginning of the UCS-2 characters there are two rogue characters 0xFF and 0xFE. Have these characters any importance? thanks in advance, Tom McGuinness