Title: RE: [htdig] how to defind word


I found somthing which might interest you.
See CTTeX : General-Purpose Thai word segmentation program

Sources are available at : http://thaigate.nacsis.ac.jp/ftp/thaisoft/new/cttex/
And there is a binary version is available for Linux Mandrake (which may work also on RedHat) ; it will be available soon on a Mandrake mirror :

see : http://www.linux-mandrake.com/en/cookerdevel.php3

Here is the description for the Linux Mandrake RPM :

--=-=-=
Name        : cttex                        Relocations: (not relocateable)
Version     : 1.21                              Vendor: MandrakeSoft
Release     : 1mdk                          Build Date: Thu Jun 29 04:55:09 2000
Install date: (not installed)               Build Host: kenobi.mandrakesoft.com
Group       : System/Internationalization   Source RPM: (none)
Size        : 442255                           License: Distributable
Packager    : Pablo Saratxaga <[EMAIL PROTECTED]>
URL         : http://thaigate.nacsis.ac.jp/files/index.html
Summary     : Cttex, Thai word separator program
Description :
The main part of Cttex is A Thai Word Separator algorithm using
a dictionary. A wrapper for formatting Thai LaTeX document file is provided
to demonstrate the use of this word-sep routine. The program can also
be used as a simple word-sep filter.
--=-=-=
* Wed Jun 28 2000 Pablo Saratxaga <[EMAIL PROTECTED]> 1.21-1mdk
- first rpm version for Mandrake
--=-=-=


Best regards,
Charles N�pote.



> -----Message d'origine-----
> De : Prisda Gomutputra [mailto:[EMAIL PROTECTED]]
> Envoy� : samedi 24 juin 2000 19:51
> � : [EMAIL PROTECTED]
> Objet : [htdig] how to defind word
>
>
> I am currently trying to fine tuning Ht://dig to be able to
> work with Thai
> (8bit) language more accurately.  I can get it to work fine
> but the accuracy
> of the search is not highly relavent since Thai lanuage does
> not have space
> to separate words.  Space is only used to seperate sentences.
>
> For example, a sentense in English "this is tesRt1. this is
> test2", it would
> be written in thai as follow "thisisteRst1. thisistest2"
>                           ^^^^
> 1) Is there a way to tell ht://dig to be able to identify the
> words and
> index them properly?
> 2) when the words are combided togeter with out space in between, it
> intorduc a new problem such as the example above,
> "thiSISTERst1".  When user
> search for a word "sister", "thiSISTERst1" will be returned
> too.   is there
> a way to prevent this problem from happening?
>
> Highly appreciated
> Prisda
>
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
>

Reply via email to