yayaivan wrote: > > Hi, > > I don't use citation. Because it take a lot of disk space, I delete everything > from citation table and set "IncrementalCitations no" in aspseek.conf and >searchd.conf
I wonder who told you that you can do so? > But now, indexer is runing in strange manner. After finishing indexing sites, I notice that some processes still work, and they are inserting > data in citation table:( > I look in conf files, and notice only now this : "You MUST NOT > change value of this parameter for not empty database". but I already did it:( > How can I now correctly stop indexer work with citations? No way. Cached copy of each file is needed for correct reindexing of the page. Let's assume that you have a page with two words in it: "memory" and "penny". Upon the first indexing, its compressed cached copy is saved in the database, and when an URL_ID is assigned to the page (let's assume it is 101). Next, words are saved into inverted index: word -> urls. So, we have two records in wordurl table: .... memory -> 101 .... penny -> 101 (Actually the word position and some other info is saved together with URL_ID, but I will skip it here for clarity). Now note that the words "memory" and "penny" can appear not only in this page, but on the many other pages as well. And there are a countless number of words. So actually we do end up with a very big table. During the next reindexing, if the document is changed, we need to clear the works that are no longer in the document, and add new words. This can be done in two ways: 1. Remove URL_ID 101 from all tables, and add all words. This is very inefficient because finding all occurences of 101 in all wordurls can take several minutes 2. Find out what words have disappeared from the page and are to be deleted, and what new words are found in the page and are to be inserted. Method number 2 is more practical, but we need to know what words were in the document when it was indexed previous time. Again, scanning all wordurl records is way too long. That's why aspseek saves a copy of the page indexed, and uses it upon reindexing to create a "delta" (changes) between two versions of the page. If you have deleted this copy, index is just not able to work any more. And last, but not least. Option "IncrementalCitation" does not switch saving a cached copy of the document. It just turns on a special enchanced more of reindexing which is faster and requires less memory, but is not compatible with aspseek-1.0 format. So is is here just for backward compatibility, and probably will be removed in aspseek-1.3. -- [EMAIL PROTECTED] ICQ UIN 7551596 Phone +7 903 6722750 -- Guinness a Day Keeps a Doctor Away (people's wisdom)
