Re: [Help-source-highlight] Unicode files ?

Dario Teixeira Fri, 02 Apr 2010 07:03:52 -0700

Hi,

> the html might bring also bad encoding in the head, but I
> guess it is also due to the fact that source-highlight reads
> two bytes, which in unicode represent a single character,
> and interprets them as two characters instead of one. 
> This is unicode, am I right?  Sorry for my ignorance,
> but with unicode in a text file every character is
> represented by two bytes, right?


Nope. There is not one standard Unicode encoding, but several.  The most
common one is UTF-8, which is a variable length encoding where each Unicode
character can take from 1 to 4 bytes (originally it was up to 6, but that's
deprecated now).  Another variable-length encoding is UTF-16, where each
character can occupy between 2 and 4 bytes.  The only fixed-length encoding
is UTF-32 (UCS-4), where each character requires 4 bytes.
 
> I'd like to try with wstring and see whether this solves
> something.

I haven't used C++ in a long time, but isn't wstring based on wchar_t,
which is 2 bytes long?   If so, it won't solve anything.  There is no
Unicode encoding that uses a fixed-length of 2 bytes!

Lorenzo, I think we can give you a hand in implementing this.  However,
if you read through this entire thread you will notice that the best
course of action is dependent on a crucial piece of information which
you are the most qualified person to provide: we need a list of the
manipulations that Source-highlight applies to strings.

Hope that helps!
Best regards,
Dario Teixeira






_______________________________________________
Help-source-highlight mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: [Help-source-highlight] Unicode files ?

Reply via email to