Ciao guys,

   again, sorry if I will certainly make mistakes. I love to get to know
more in this area, which is pretty new for me too. So please be patient.
:-)

Il lun, 2002-12-09 alle 02:14, Lachlan Andrew ha scritto:
> - The format you describe sounds like a "half-inverted"
>   file -- listing locations *within* a document by word, but
>   listing *document* locations by document.  Is that
>   correct?

I think that was a flat representation of the index file, just an
example. Am I right, Neal?

In a simple scenario, we'll have - (please consider it is a very very
draft!):
- a word index (word id, stemmed/unstemmed flag, maybe language?)
- a document index (document id, info regarding the document, pretty
much as now: title, modification date, etc.)
- an inverted index (word id, document id, locations)

Words
-----
ID      Word            S/U     Lang
--      ----            ---     ----
1       traveling       0       en
3       casa            0       it
12      travel          0       en
23      travels         0       en
45      pasta           0       it
60      travel          1       en
...

Documents
---------
ID      URL                     Other info
--      ---                     ----------
1       http://www.pippo.it/    .....
2       http://www.htdig.org/   ...


Index
-----
ID W    ID D    Locations and related info (position and markup)
----    ----    ------------------------------------------------
1       2       1 Value_location 3 Value_location

Value_Location is the value given to the location of the word

Am I right?

Of course it's just an example ... :-)

Any comments about the language?

> - With stemming in general, what is done about negating
>   affixes?  If I searched for 'mercy', I wouldn't want
>   results about 'merciless' (although I would want results
>   about 'merciful').

Good point, are there any plans to include negative words too?

Ciao ciao
-Gabriele
-- 
Gabriele Bartolini - Web Programmer
Comune di Prato - Prato - Tuscany - Italy
[EMAIL PROTECTED] | http://www.comune.prato.it
> find bin/laden -name osama -exec rm {} ;

Attachment: signature.asc
Description: PGP signature

Reply via email to