RE: Unicode / Transliteration

Ed Batutis Mon, 10 Dec 2001 09:21:51 -0800

>   where 'transliteratorX' is the name of a transliteration class to use
>   (i.e. Unicode::Transliterate::ISO_8859_15::ASCII for the ISO_8859_15
>   to ASCII transliteration table).
>


Your main example was accent-stripping. This operation isn't related to
encodings really. It is a general operation that simply happens to yield an
ASCII subset for a certain input. Maybe you want to define your module as
something more specific and more inclusive: a Unicode-to-human-readable-URI
transliterator. This will be a non-trivial module to implement in full! Yet,
something pretty useful could be done fairly simply using nothing more than
regexes.

Script-to-script transliterators (Japanese->Latin for example) would be
useful, but encoding-to-encoding transliterators are not so useful really.
There are too many dimensions to the problem. And, fallback characters or
routines are probably the best design to generate useful output when
mis-matched encodings are being cross-converted.

Here's an interesting article on transliteration in general and ICU's
implementation in particular:

http://oss.software.ibm.com/icu/userguide/Transliteration.html

Transliteration itself pre-dates computers by centuries. It is a fascinating
topic for anyone interested in linguistics.

=Ed

RE: Unicode / Transliteration

Reply via email to