On Sat, 29 Jul 2000, Chen-hsiu Huang wrote:

> OK. I've also checked libunicode. I'll try to start from here.
> Besides, extracting words from multi-bytes locale is pretty hard, 
> especially for those terms containing both english (or some else)
> and unicode.

My first concern is that the code (including the String class) is not
multibyte-clean. It periodically makes assumptions that characters are one
byte long and uses char * arithemetic to advance one character at a time.

Given that, I think word parsing is further down the list. My guess is
that we'll want to work on the HtWordType code and make it into a
generalized word parser with appropriate subclasses. I will probably need
to start this work anyway for the new query parser.

> I guess so. But, how about rewrite htdig in PERL ? Does anyone think
> about this ? 

There are *many* Perl search engines. Some good, some not-so-good. I
believe Avi keeps a list at her website: <http://www.searchtools.com/>.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 


Reply via email to