Dear Mr. Barkov, I have read the unicode.c and would like to make the following changes, before that, I would like to hear your advise:
1. separate unicode.c into: unicode-convert.h unicode-convert.c unicode.h unicode.c which unicode-convert.h contains all code convert tables. and unicode-convert.c contain codes for conversion. unicode.h contain those tables and unicode.c contain the remaining codes. I would like to write some scripts to generated the two .h files from data found from unicode.org. That way, when the new unicode release, we need only to generate the new .h and recompile. 2. add a tonormalize code which is similar to tolower, but it also map the diacreted characters to lower non diacreted equivalents. For example it will change all ç Ç C c to c. 3. I have started to construct a variant equivalent table for Chinese characters. But If I put that into the above tonormalize there will be a very big table. I have think of doing the mapping when the input code is converted into unicode (instead of convert them to different variant equivalent form, convert them to the a chosen variant form. In that way, we need only to modify the big5, gb, jis to unicode table. But I am not very sure is this hack is good or bad. 4. As mnogosearch is a open source project, I have a little difficult to contribute the code directly : I can not get the premission from my boss even I write the code at my own time. So, Before sent you the patch, I would like to hear from you. Best regards, Kent Sin --- Alexander Barkov <[EMAIL PROTECTED]> wrote: > Hi! > > Feel free to send us patches or library. > > Thanks! > > Sin Hang Kin wrote: > > > YES! > > > > I put it on utf-8 for test, it does support > chinese with single character > > search. I also make a search on some Portuguese, > which turn out it does not > > convert the character with accents correctly. For > example, only edicao yield > > the result, but edicao does not. > > > > Also, I would like to introduce some new features > for Chinese, mainly I > > would like to introduce a word segmentation > process and > > Traditional/Simplified Chinese code cross-search. > > > > There is an LIBTABE module in C which can handle > the word segmentation in > > Big-5. The code is licenced under GPL, can I bring > them in? > > > > Also, I would like to map all the traditional > Chinese characters to its > > corresponding simplified version. So they can be > searched. It is also > > possible to include a Chinese/English > Chinese/Pinyin dictionary lookup after > > the word segmentation to insert the English and > Pinyin into the catalog so > > the search can be cross-language. > > > > The Traditional/Simplified cross-search required > the search term be > > converted to simplified code also for complete > operation. > > > > I know a little about C and unfimilar with the > code of mnogosearch, but I am > > willing to help with the above change. If the > above is possible with > > mnogosearch, what code should I look for? > > > > Rgs, > > > > > > -- > > bar > > ___________________________________________ > If you want to unsubscribe send "unsubscribe > general" > to [EMAIL PROTECTED] > __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]