Hello,

On Jan 27, 2006, at 11:44, John Haxby wrote:

I've attached the perl script -- feed http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt to it.

Thanks! Works great!

It's based on a slightly different principle to yours. You seem to look for things like "mumble mumble LETTER X mumble" and take "X" as the base letter.

Yes, here is the mumbling algorithm in its full glory: aLetter = aLine:match( ".+%s(%u)%U.*" )

That means that, for example, ɖ (a "d" with a hook) gets converted to "d". My script, on the other hand, deals with things like "Ǣ" (LATIN CAPITAL LETTER AE WITH MACRON) and converts it to AE. There are some differences of opinion though, you have ß mapped to "s" whereas I have "ss" ("strße" to "strasse" instead of "strase" seems right). I think I'm also over-enthusiastic when it comes to mapping characters to spaces: I know that there are some arabic characters that get mapped to spaces. For the purposes of converting to an ASCII approximation, though, I suspect a combination of your approach and mine would be best. What do you think?

Overall, I much prefer your approach. Here is the updated Lua table derived from your handy perl script:

http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt

You also mentioned a full Unicode to ASCII transliteration/transcription module of some sort. Is it something you would like to share as well? :))

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to