On 10 Nov 2002 at 23:11, Geoff Hutchison wrote:

> 
> On Saturday, November 9, 2002, at 04:27  PM, Dan Langille wrote:
> 
> > years of data.  I'm seeking comments on my approach.
> 
> You're doing some unnecessary moving around, but I'd need some 
> clarifications. I *think* what you're doing is indexing into the -merge 
> databases, which are *only* used for indexing and the eventual merging 
> into the old database. And the main database is only used for searching 
> and the merging.
> 
> Right?

Correct.

> > cp adsl-merge.docdb.work      adsl-merge.docdb
> > cp adsl-merge.docs.index.work adsl-merge.docs.index
> > cp adsl-merge.wordlist.work   adsl-merge.wordlist
> > cp adsl-merge.words.db.work   adsl-merge.words.db
> 
> OK, if you're only ever using this for merging, then this is completely 
> useless. Only the .work files are ever touched. So you have a lot of 
> duplicate data.

I tried the procedure without this step.  It failed to include the 
new documents to the final index.  When I tried again, doing the cp, 
it found the new document.  I don't yet have an explanation for this 
but I'm calling it a day for now. I'll look into it later.

FWIW, the full script is at 
http://www.unixathome.org/index-merge-rundig-test.sh.txt

> > After the merge, this moves the new search data into production:
> > mv adsl.docdb.work      adsl.docdb
> > mv adsl.docs.index.work adsl.docs.index
> > mv adsl.wordlist.work   adsl.wordlist
> > mv adsl.words.db.work   adsl.words.db
> 
> OK, but the .wordlist file is never used by htsearch. So you might as 
> well leave it as a .work file (where it's used by the merge) and never 
> copy it either.
> 
> Also, if you have the disk space, you can change those "mv" commands to 
> "cp" and leave the .work files in place--it'll save time, though 
> admittedly at the expense of disk space.

Good idea.  But I will do a mv, then cp them back.  This will reduce 
the chance that htsearch is invoked on a file which is being copied 
to.

-- 
Dan Langille : http://www.langille.org/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to