Re: encoding

petite_abeille Sat, 28 Jan 2006 08:20:30 -0800

Hello,

On Jan 27, 2006, at 11:44, John Haxby wrote:

I've attached the perl script -- feedhttp://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt to it.


Thanks! Works great!

It's based on a slightly different principle to yours. You seem tolook for things like "mumble mumble LETTER X mumble" and take "X" asthe base letter.

Yes, here is the mumbling algorithm in its full glory: aLetter =aLine:match( ".+%s(%u)%U.*" )

That means that, for example, ɖ (a "d" with a hook) gets convertedto "d". My script, on the other hand, deals with things like "Ǣ"(LATIN CAPITAL LETTER AE WITH MACRON) and converts it to AE. Thereare some differences of opinion though, you have ß mapped to "s"whereas I have "ss" ("strße" to "strasse" instead of "strase" seemsright). I think I'm also over-enthusiastic when it comes to mappingcharacters to spaces: I know that there are some arabic charactersthat get mapped to spaces. For the purposes of converting to anASCII approximation, though, I suspect a combination of your approachand mine would be best. What do you think?

Overall, I much prefer your approach. Here is the updated Lua tablederived from your handy perl script:


http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt

You also mentioned a full Unicode to ASCIItransliteration/transcription module of some sort. Is it something youwould like to share as well? :))


Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: encoding

Reply via email to