Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread Mojca Miklavec
On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:
 On 5/14/2013 6:07 PM, luigi scarso wrote:

 I Hope  that someone can help here


 as Mojca mentioned thai at bachotex i'll add the patterns as a start

 given specs, examples and time, adding support for thai to context shouldn't
 be too hard (assuming that there are users)

But it's not trivial either.

There's an opensource project implementing word segmentation:
http://linux.thai.net/projects/swath
The specification (someone's thesis) can be found here:
http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf

The ugly part of pdfTeX approach is that it requires an external text
processor to digest an input TeX document and return a copy with word
segmentation. Then pdfTeX is run on the resulting file. XeTeX can use
ICU library to do the segmentation.

In LuaTeX one would have to plug the word segmentation somewhere (but
writing that part is slightly non-trivial).

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread Hans Hagen

On 5/15/2013 4:09 PM, Mojca Miklavec wrote:

On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:

On 5/14/2013 6:07 PM, luigi scarso wrote:


I Hope  that someone can help here



as Mojca mentioned thai at bachotex i'll add the patterns as a start

given specs, examples and time, adding support for thai to context shouldn't
be too hard (assuming that there are users)


But it's not trivial either.


It depends ... we're using a dictionary to determine word boundaries, 
aren't we? I'm pretty sure that I've done more complex coding.



There's an opensource project implementing word segmentation:
 http://linux.thai.net/projects/swath
The specification (someone's thesis) can be found here:
 http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf


Ok, so there are some ttext files there with words.


The ugly part of pdfTeX approach is that it requires an external text
processor to digest an input TeX document and return a copy with word
segmentation. Then pdfTeX is run on the resulting file. XeTeX can use
ICU library to do the segmentation.

In LuaTeX one would have to plug the word segmentation somewhere (but
writing that part is slightly non-trivial).


I just did a quick test using those dictionaries (abusing some code that 
i already had on my machine). Quite doable. It all depends on having the 
dictionaries available (on the garden or in the distribution).


Anyhow, it's not that much font related, just language / script support 
and we already have that for some languages and adding thai to it 
doesn't hurt. Of course we'd need some testing. It doesn't make much 
sense to add features to context that no one would use at some point.


But ... Luigi is already teaching himself Thai, so ...

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread luigi scarso
On Wed, May 15, 2013 at 5:20 PM, Hans Hagen pra...@wxs.nl wrote:


 But ... Luigi is already teaching himself Thai, so ...

no no, just connecting people on different ml.
Currently I'm in a completely different area
-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

[NTG-context] Support for Thai in ConTeXt

2013-05-14 Thread luigi scarso
On Tue, May 14, 2013 at 5:59 PM, Theppitak Karoonboonyanan 
theppi...@gmail.com wrote:

 On Tue, May 14, 2013 at 9:58 PM, luigi scarso luigi.sca...@gmail.com
 wrote:
 
  On Tue, May 14, 2013 at 4:16 PM, Mojca Miklavec
  mojca.miklavec.li...@gmail.com wrote:
 
  I could also ask differently: suppose that a motivated Thai programmer
  would be willing to work on solving the problem properly. What would
  be the suggested solution?
 
  You can post also in the context ml, maybe there is some Thai user there
 .

 I am a Thai developer who works on Thai word segmentation tools and
 thailatex package. So, you can suggest to me. (Please Cc: me, I'm not
 in the mailing list.)

 I'm totally new to LuaTeX and Lua programming language. But I can learn
 necessary stuffs to get it done.

 With a quick search, I saw linebreak_filter callback in LuaTeX reference.
 Is that relevant to the problem? Or using external filter is already
 acceptable?

 Regards,
 --
 Theppitak Karoonboonyanan
 http://linux.thai.net/~thep/


I Hope  that someone can help here

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] Support for Thai in ConTeXt

2013-05-14 Thread Hans Hagen

On 5/14/2013 6:07 PM, luigi scarso wrote:


I Hope  that someone can help here


as Mojca mentioned thai at bachotex i'll add the patterns as a start

given specs, examples and time, adding support for thai to context 
shouldn't be too hard (assuming that there are users)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___