Toivo Pedaste writes:
 > 
 > I was able to index about a 100000 pages in less than a day on a 
 > machine with 512meg of memory, on a 256meg machine it had only
 > done 50000 pages after two days. The indexing process does
 > seem very memory intensive if you want decent performance, I'm
 > not sure what can be done about it though, it seems to be
 > just lack of locality of reference into the db.words.db file.

 No locality of references, indeed.

 > I believe there are plans to checksum pages so as to reject
 > aliases (duplicates), how is that going?  It is really something
 > of an administrative nightmare to deal with a large site without it.
 > 
 > I'm also getting close to the 2Gig file size limit on my
 > words.db file, is there any strucural reason that it
 > couldn't be split into multiple files?

 Four solutions : activate compression in WordList.cc, db_dump + db_load
would reduce the size of the file by half, implemnet dynamic repacker in
Berkeley DB, implement autosplit files in WordList.cc based on a key
calculated from the word. 
 Of all these we are working on 1 and 2. 

 What is the size of your original data ? 

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 10 85
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to