Hello,
On Jan 27, 2006, at 11:44, John Haxby wrote:
I've attached the perl script -- feed
http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt to it.
Thanks! Works great!
It's based on a slightly different principle to yours. You seem to
look for things like "mumble mumble LETTER X mumble" and take "X" as
the base letter.
Yes, here is the mumbling algorithm in its full glory: aLetter =
aLine:match( ".+%s(%u)%U.*" )
That means that, for example, ɖ (a "d" with a hook) gets converted
to "d". My script, on the other hand, deals with things like "Ǣ"
(LATIN CAPITAL LETTER AE WITH MACRON) and converts it to AE. There
are some differences of opinion though, you have ß mapped to "s"
whereas I have "ss" ("strße" to "strasse" instead of "strase" seems
right). I think I'm also over-enthusiastic when it comes to mapping
characters to spaces: I know that there are some arabic characters
that get mapped to spaces. For the purposes of converting to an
ASCII approximation, though, I suspect a combination of your approach
and mine would be best. What do you think?
Overall, I much prefer your approach. Here is the updated Lua table
derived from your handy perl script:
http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt
You also mentioned a full Unicode to ASCII
transliteration/transcription module of some sort. Is it something you
would like to share as well? :))
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]