On 2010-03-07 <20:07:16>, Thomas A. Schmitz wrote:
> 
> On Mar 7, 2010, at 11:54 AM, Philipp Gesang wrote:
> 
> Just one thought on your transliterator: a couple of years ago, Hans
> set up something a bit similar for Greek. It is based on lpeg, though,
> not gsub and so should be somewhat faster. If you look at
> context/tex/texmf-context/scripts/context/lua/mtx-babel.lua
> you'll see what he did. In theory, this mechanism is general, and all
> sorts of transliteration schemes could be hooked into it. Might give you
> some ideas or not...
I'm afraid lpegs, elegant though they are, would complicate the matter a
bit.  Try this:

\startluacode
  s1, s2, s3 = "abc", "äbz", "аbc"
  p1, p2, p3 = lpeg.P("a")  , lpeg.P("ä")  , lpeg.P("а")
  --                                                 ^ == u1072
  context(lpeg.match(p1, s1))   --> 2, correct
  context(lpeg.match(p2, s2))   --> 3, wrong
  context(lpeg.match(p3, s3))   --> 3, wrong
\stopluacode

You'll see that lpeg isn't unicode-aware.  On the other hand Roberto has
a snippet on his page[1] that gets the unicode number out of an utf-8
octet sequence (up to 4 bytes), though I don't hasten to go this way: it
would mean converting all the tables into integers, converting the
input into an array of ints, then do multi-char replacement (=integer
substitution) on this array and finally converting it back into sequence
of chars.  Not sure if transliteration of some single words is worth it.

Anyway, I'm glad you pointed me to the babel script as I hadn't noticed
it before.


Philipp


[1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html#ex
> 
> Thomas
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the 
> Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Attachment: pgp3GcudsL9ir.pgp
Description: PGP signature

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to