Re: [NTG-context] UTF conversion via Lua

luigi scarso Fri, 10 Feb 2012 03:14:47 -0800

2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. <l...@pontex.cz>:
> ... Well, my information was not correct.
>
> There are characters > 127 in the file, like "ř", "š"...
>
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters
> are displayed correctly.
>
> But I have problem loading them into ConTeXt.
>
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable
> by ConTeXt.
>
> @Thomas:
>
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>
> I prepared some tables: character conversion and removal of diacritics (see
> the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
>
> Best regards,
>
> Lukas


To avoid confusion :
If you mean ASCII with coderange 0-127, there is no need to conversion;
if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
encoding there is no need to conversion;
otherwise you need to specify an encoding (i.e. CP 1250)


From wikipedia
"""
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a
much wider array of characters, and their various encoding forms have
begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments.
While ASCII is limited to 128 characters, Unicode and the UCS support
more characters by separating the concepts of unique identification
(using natural numbers called code points) and encoding (to 8-, 16- or
32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards. Therefore, ASCII can be
considered a 7-bit encoding scheme for a very small subset of
Unicode/UCS, and, conversely, the UTF-8 encoding forms are
binary-compatible with ASCII for code points below 128, meaning all
ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
they represent the first 128 characters of Unicode, but use 16 or 32
bits per character, so they require conversion for compatibility.
(similarly UCS-2 is upwards compatible with UTF-16)
"""
If you have iconv, convert between encoding is easy --- you can always
call it as an external program with os.execute(cmd)

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Re: [NTG-context] UTF conversion via Lua

Reply via email to