yayaivan wrote:
> 
> Hi,
> 
> I don't use citation. Because it take a lot of disk space, I delete everything
> from citation table and set "IncrementalCitations no" in aspseek.conf and 
>searchd.conf

I wonder who told you that you can do so?

> But now, indexer is runing in strange manner. After finishing indexing
 sites, I notice that some processes still work, and they are inserting
> data in citation table:(
> I look in conf files, and notice only now this : "You MUST NOT
> change value of this parameter for not empty database". but I already did it:(
> How can I now correctly stop indexer work with citations?

No way. Cached copy of each file is needed for correct reindexing
of the page. Let's assume that you have a page with two words in
it: "memory" and "penny". Upon the first indexing, its compressed
cached copy is saved in the database, and when an URL_ID is assigned
to the page (let's assume it is 101).

Next, words are saved into inverted index: word -> urls. So, we have two
records in wordurl table:

....
memory -> 101
....
penny -> 101

(Actually the word position and some other info is saved together with
URL_ID, but I will skip it here for clarity).

Now note that the words "memory" and "penny" can appear not only
in this page, but on the many other pages as well. And there are
a countless number of words. So actually we do end up with a very
big table.

During the next reindexing, if the document is changed, we need to
clear the works that are no longer in the document, and add new words.
This can be done in two ways:

1. Remove URL_ID 101 from all tables, and add all words.
   This is very inefficient because finding all occurences of 101
   in all wordurls can take several minutes

2. Find out what words have disappeared from the page and are
   to be deleted, and what new words are found in the page and
   are to be inserted.

Method number 2 is more practical, but we need to know what words
were in the document when it was indexed previous time. Again,
scanning all wordurl records is way too long.

That's why aspseek saves a copy of the page indexed, and uses
it upon reindexing to create a "delta" (changes) between
two versions of the page. If you have deleted this copy,
index is just not able to work any more.

And last, but not least. Option "IncrementalCitation" does not
switch saving a cached copy of the document. It just turns on
a special enchanced more of reindexing which is faster and requires
less memory, but is not compatible with aspseek-1.0 format.
So is is here just for backward compatibility, and probably
will be removed in aspseek-1.3.

-- [EMAIL PROTECTED] ICQ UIN 7551596 Phone +7 903 6722750 --
   Guinness a Day Keeps a Doctor Away (people's wisdom)

Reply via email to