Iosif Fettich wrote:
> 
> what's the total size of what you're indexing ?

292MB, I believe. I can't remember if the "du" command works exactly
write on Linux, seems like there used to be a problem -- anyway, "du
-cks" reports "292359  total", so I'll assume.

> > This seems larger than it used to be.
> 
> Significantly different ? I'm not sure anymore: did you say in the last
> message that you're using 3.1.0b2 ? 

I can't remember, but it seems larger by 50MB or so (could be we just
keep adding so much). I am, however, running htdig-3.1.0b2, installed
Nov. 6.

> If that gives a clue: indexing here about 5000 html documents
> (approx. 25 MB) generates something like
> -rw-r--r--   1 root     root      7284736 Nov 16 03:05 db.docdb
> -rw-r--r--   1 root     root       550912 Nov 16 03:05 db.docs.index
> -rw-r--r--   1 root     root      9905263 Nov 16 03:05 db.wordlist
> -rw-r--r--   1 root     root      9511936 Nov 16 03:05 db.words.db

So, your dbs are actually slightly larger than your document base? Well,
if htdig didn't fail, I suppose mine might be slightly larger too,
although it should still have enough space. 

Am I right in assuming that running "htdig -i -v -s" isn't creating a
temporary set of databases and then writing them to the db directory?
Because if it did, I'ld need over 500MB free on the hard disk, and I
wouldn't have that much space free.

Any ideas appreciated.

> It's true, with a badwords list where I put in all meaningless words I was
> able to spot using contrib/wordfreq/. That almost halved database size.
 
I'll have to take a look at that, thanks.

Jeff H.


*********   HR On-Line:  The Network for Workplace Issues   ********
** Ph:416-604-7251 -- Fax:416-604-4708 ** http://www.hronline.com **
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to