On Sat, 29 Jul 2000, Chen-hsiu Huang wrote:
> OK. I've also checked libunicode. I'll try to start from here.
> Besides, extracting words from multi-bytes locale is pretty hard,
> especially for those terms containing both english (or some else)
> and unicode.
My first concern is that the code (including the String class) is not
multibyte-clean. It periodically makes assumptions that characters are one
byte long and uses char * arithemetic to advance one character at a time.
Given that, I think word parsing is further down the list. My guess is
that we'll want to work on the HtWordType code and make it into a
generalized word parser with appropriate subclasses. I will probably need
to start this work anyway for the new query parser.
> I guess so. But, how about rewrite htdig in PERL ? Does anyone think
> about this ?
There are *many* Perl search engines. Some good, some not-so-good. I
believe Avi keeps a list at her website: <http://www.searchtools.com/>.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.