At 12:35 22/11/2000 -0600, Gilles Detillieux wrote:
4. After all the sites have been htdigged, I run htmerge in sequence in
order to merge all the small databases into one.
First call is "htmerge -c site1.conf", subsequents call are "htmerge -c
site1.conf -m site2.conf", "htmerge -c site1.conf -m site3.conf", (and
so on.)
...
2. Now let's hear the amazing part of my story. If I do a "htmerge -c
site5.conf" (notice there is no -m this time.) and if I htsearch -c
site5.conf with "rénovation tourisme" my document is said to be found !
Said in another way, the document was indexed but was certainly ripped out
when merging with another database.
I think after each separate htdig -i -c site#.conf you should run a
separate htmerge -c site#.conf, not just on the first site, before you
merge everything together. Try that and see if it solves the problem.
I think the intention was that these extra merges should not have been
necessary, but this has come up before, and I think there's a problem
with merging multiple DBs when they haven't already been cleaned up by
a simple htmerge.
I tried it and it didn't solve the problem. BTW, I don't think that these
extra merges are necessary either.
Now, I run :
htmerge -c site#.conf
then
htmerge -c site1.conf -m site#.conf (with # 1)
If I then run
htsearch -c site5.conf with words="rénovation tourisme", it finds the
document (in first place.)
But if I do
htsearch -c site1.conf with the same words, it returns the "nomatch" document.
Some of the web hosts are case sensitives and some are not. Could it be the
source of my problem ?
What are the rules for htmerge ? When does it really remove URLs from
database ?
--
Olivier Korn
Strasbourg, France.
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html