> > 3. I have started to construct a variant equivalent > > table for Chinese characters. But If I put that into > > the above tonormalize there will be a very big table. > > I have think of doing the mapping when the input code > > is converted into unicode (instead of convert them to > > different variant equivalent form, convert them to the > > a chosen variant form. In that way, we need only to > > modify the big5, gb, jis to unicode table. But I am > > not very sure is this hack is good or bad. > > > I think this table should be done not in big5 or gb > form, but in unicode format. Like toupper/tolower.
So, I will contribute the table. There shall be a few level for the operator to choose: 1. Simplified and Traditional variants, these are taken directly from the unihan.txt 2. Variants identified by CCCII, which can be extracted from the unihan.txt also 3. Meaning similar, which not identical variant form, but some very similar in usage or by mistake which would be useful for search propose only. This is done by manually lookup the dictionary and the character frequency table. 4. Numeric variants which maps all numeric characters to 1,2,3,... 5. Punctuation and full-sized alphabets. > > 4. As mnogosearch is a open source project, I have a > > little difficult to contribute the code directly : I > > can not get the premission from my boss even I write > > the code at my own time. So, Before sent you the > > patch, I would like to hear from you. > > Can you hear me? :-) > > > By the way, just interesting... > > Why your boss doesn't allow to contribute into open > source project? I work in a gov agency, my boss have no ideas about code and software. They fear of any different acts and responsabilities. Our contract prohabit us to do any part-time works even not for money. It is really difficult to get them to sign the paper required to let the code free. Rgs, Kent Sin _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]