Greetings Geoff,

On Tue, 3 Dec 2002 16:01, Geoff Hutchison wrote:

> > I have also been wondering if it is possible to turn
> > off word-level indexing, to give (much) smaller
> > inverted files if people don't need phrase searching. 
> > Does anybody know?
>
> Not at the moment.
>
> But you lose a lot more than phrase searching. You lose
> field-restricted searching. You lose scoring by proximity
> (like Google). You lose the ability to score "on the
> fly"--not to be discounted since many users wonder why
> they change their scoring factors and the results don't
> change.

Thanks for raising those points.  These are all 
enhancements that came with 3.2.0's database restructure, 
but I think that only phrase searching actually *needs* 
word-level inverted files.

As I said, document-level indexing is a strong motivation 
for the word attributes to be pure bitmaps.  The index 
could store the "OR" of each field set for any occurrence, 
so you could still say "If this word occurs in the title 
AND that word occurs in a heading".  I agree that 
on-the-fly scoring is the way to go, but again I can't see 
why it couldn't be done based on the OR of the flags 
(although I could be missing something).

Even (very coarse) proximity searching can be done fairly 
efficiently by, for example, dividing each document into 
eight regions and specifying (in one byte) which regions 
contain the word.

I'm trying to avoid the "progress = bloat" phenomenon.  
Although I don't want to change htDig://'s course, my 
original interest in it was my aim of all Linux boxes 
having all their documentation searchable.  That is one 
application which requires a minimal-overhead option, 
albeit with reduced performance.

If I get enthusiastic, I'll look at writing a patch...

Cheers,
Lachlan

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg          CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA      00116K


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to