Re: Chinese Support

kent sin Wed, 20 Feb 2002 18:15:55 -0800

Dear Mr. Barkov,

I have read the unicode.c and would like to make the
following changes, before that, I would like to hear
your advise:

1. separate unicode.c into:

   unicode-convert.h  unicode-convert.c
   unicode.h
   unicode.c

   which unicode-convert.h contains all code convert
tables. and unicode-convert.c contain codes for
conversion. unicode.h contain those tables and
unicode.c contain the remaining codes.

   I would like to write some scripts to generated the
two .h files from data found from unicode.org. That
way, when the new unicode release, we need only to
generate the new .h and recompile.

2. add a tonormalize code which is similar to tolower,
but it also map the diacreted characters to lower non
diacreted equivalents. For example it will change all
į Į C c to c.

3. I have started to construct a variant equivalent
table for Chinese characters. But If I put that into
the above tonormalize there will be a very big table.
I have think of doing the mapping when the input code
is converted into unicode (instead of convert them to
different variant equivalent form, convert them to the
a chosen variant form. In that way, we need only to
modify the big5, gb, jis to unicode table. But I am
not very sure is this hack is good or bad.

4. As mnogosearch is a open source project, I have a
little difficult to contribute the code directly : I
can not get the premission from my boss even I write
the code at my own time. So, Before sent you the
patch, I would like to hear from you.

Best regards,

Kent Sin

--- Alexander Barkov <[EMAIL PROTECTED]> wrote:
> Hi!
> 
> Feel free to send us patches or library.
> 
> Thanks!
> 
> Sin Hang Kin wrote:
> 
> > YES!
> > 
> > I put it on utf-8 for test, it does support
> chinese with single character
> > search. I also make a search on some Portuguese,
> which turn out it does not
> > convert the character with accents correctly. For
> example, only edicao yield
> > the result, but edicao does not.
> > 
> > Also, I would like to introduce some new features
> for Chinese, mainly I
> > would like to introduce a word segmentation
> process and
> > Traditional/Simplified Chinese code cross-search.
> > 
> > There is an LIBTABE module in C which can handle
> the word segmentation in
> > Big-5. The code is licenced under GPL, can I bring
> them in?
> > 
> > Also, I would like to map all the traditional
> Chinese characters to its
> > corresponding simplified version. So they can be
> searched. It is also
> > possible to include a Chinese/English
> Chinese/Pinyin dictionary lookup after
> > the word segmentation to insert the English and
> Pinyin into the catalog so
> > the search can be cross-language.
> > 
> > The Traditional/Simplified cross-search required
> the search term be
> > converted to simplified code also for complete
> operation.
> > 
> > I know a little about C and unfimilar with the
> code of mnogosearch, but I am
> > willing to help with the above change. If the
> above is possible with
> > mnogosearch, what code should I look for?
> > 
> > Rgs,
> > 
> 
> 
> 
> -- 
> 
>   bar
> 
> ___________________________________________
> If you want to unsubscribe send "unsubscribe
> general"
> to [EMAIL PROTECTED]
> 

__________________________________________________
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Re: Chinese Support

Reply via email to