[htdig-dev] Incremenal Index Efficiency, Unicode, & 2GIG limit

Neal Richter Fri, 04 Jan 2002 11:24:02 -0800

Hello again,

        I've got a couple more questions..


1.      Is there any need to rebuild the index from scratch
periodically?  Some commercial search engines use incremental indexing and
recommend that when the incremental portion of the index gets to be a
given size (say 20%) the entire index is rebuilt.

2.      Is it possible to turn stemming off for particular languages
during run time?  We have our own stemming tools.. (Porter Algorithm)

3.      (Unicode) Is the index (the core of the index code) capable of
doing multibyte searching?  For example if a fully escaped version of a
Japanese or other multibyte document was indexed.. and then searched with
a properly escaped query.. would valid matches occur? (exculde any UI or
upper level code in your thing here.)

4.      (2 Gig Limit)  Some of the archives will be at a million+ 
documents in size with an average length exceeding 2K.  Other than using
XFS or JFS, the solution in this case is to use multiple index files?

5.      Is there a way to add a 'field' to the index?  Ie.. multiple
documents share a source-id & a query is given to return the documents
with that source-id.  This could accomplished implicitly by modifying the
source-id to be some special alpha-numeric character (DJ23KJD823).. but
this has a small probability of giving false-positive search results.

Thanks for your help!


-- 
Neal Richter 
Knowledge base Developer
Right Now Technologies, Inc.
Customer Service for Every Web Site




_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

[htdig-dev] Incremenal Index Efficiency, Unicode, & 2GIG limit

Reply via email to