Byte Order Marks

2001-04-19 Thread Tomas McGuinness

Hi,

A quick question relating to the Byte Order Mark of UCS-2. If its absent is
it safe to assume any particular order (i.e. Big or Little Endian?).

I am writing a function to rearrange from Big to little endian but without a
byte order mark I'm not sure what the order is. Is there any
specification I could refer to?

Thanks.

Tom

Tomas McGuinness   Consultant
> --
> 
> University Technology Park*   +353 21 4933 277 
>  Curraheen Rd, Cork  *+353 21 4933 201
> * [EMAIL PROTECTED]
> --
> 
> CMG   Telecom Products Division
>   Product Development, Cork 
> --
> 
> 
> 
> 




gb2312

2001-04-10 Thread Tomas McGuinness

Hi,

Is the character set gb2312 encoded in a two octet scheme? If so does it pad
out its ascii characters to two octets e.g. the character < is 0x3C in ascii
so does it become 0x003C in gb2312?

Regrards,

Tom.

Tomas McGuinness   Consultant
> --
> 
> University Technology Park*   +353 21 4933 277 
>  Curraheen Rd, Cork  *+353 21 4933 201
> * [EMAIL PROTECTED]
> --
> 
> CMG   Telecom Products Division
>   Product Development, Cork 
> --
> 
> 
> 
> 




Byte Order Marks

2001-04-10 Thread Tomas McGuinness

Hi,

When looking at a document would it be safe to assume that if you found any
of the following Byte Order Marks 
*   0xFFFE (UCS-2 Little Endian)
*   0xFEFE (UCS-2 Big Endian)
*   0xEFBBBF (UTF-8)
That the document is encoded with that encoding format. That means that if I
found the first 3 octets to be EF BB EF could I assume I am dealing with a
UTF-8 Document.

Apart from UTF and Unicode/UCS encoding formats do any other "legacy"
character sets use Byte Order Marks?

Regrads,

Tom.

Tomas McGuinness   Consultant
> --
> 
> University Technology Park*   +353 21 4933 277 
>  Curraheen Rd, Cork  *+353 21 4933 201
> * [EMAIL PROTECTED]
> --
> 
> CMG   Telecom Products Division
>   Product Development, Cork 
> --
> 
> 
> 
> 




RE: Code charts

2001-04-09 Thread Tomas McGuinness

Hi all,

Could any one tell me if this is correct. Is the UCS-2 hex representation of
the US-ASCII character < is 0x003C ?? This character does not seem to be
present in GB2312-80 or am I wrong. 

Tomás

-Original Message-
From: Tomas McGuinness [mailto:[EMAIL PROTECTED]]
Sent: 09 April 2001 10:49
To: '[EMAIL PROTECTED]'
Subject: Code charts


Hi all,

I have a question relating to UCS-2. 

I am working on a project that involves converting WML and HTML documents
from a character set to UCS-2. The problem is that the UCS-2 hex
representation for say 0x003C (<) is not present in GB2312 [the same glypg I
mean]. Its not in the mapping table chart I have anyway. Does the Simplified
Chinese character set have this character or is my mapping table incorrect?
Could anyone tell me if its possible to download these code mapping charts
from the internet.

Thanks in advance,

Tomás McGuinness



Tomas McGuinness   Consultant
> --
> 
> University Technology Park*   +353 21 4933 277 
>  Curraheen Rd, Cork  *+353 21 4933 201
> * [EMAIL PROTECTED]
> --
> 
> CMG   Telecom Products Division
>   Product Development, Cork 
> --
> 
> 
> 
> 




Code charts

2001-04-09 Thread Tomas McGuinness

Hi all,

I have a question relating to UCS-2. 

I am working on a project that involves converting WML and HTML documents
from a character set to UCS-2. The problem is that the UCS-2 hex
representation for say 0x003C (<) is not present in GB2312 [the same glypg I
mean]. Its not in the mapping table chart I have anyway. Does the Simplified
Chinese character set have this character or is my mapping table incorrect?
Could anyone tell me if its possible to download these code mapping charts
from the internet.

Thanks in advance,

Tomás McGuinness



Tomas McGuinness   Consultant
> --
> 
> University Technology Park*   +353 21 4933 277 
>  Curraheen Rd, Cork  *+353 21 4933 201
> * [EMAIL PROTECTED]
> --
> 
> CMG   Telecom Products Division
>   Product Development, Cork 
> --
> 
> 
> 
> 




[unicode] UCS-2 Files

2001-03-22 Thread Tomas McGuinness


Hi,

I have a question relating to UCS-2. I am currently developing a product
that will support UCS-2 and I have been sent several documents encoded in
UCS-2. I have no reader or writer for UCS-2 but I have performed Hexdumps in
UNIX. At the beginning of the UCS-2 characters there are two rogue
characters 0xFF and 0xFE. Have these characters any importance?

thanks in advance,

Tom McGuinness