At 12:35 22/11/2000 -0600, Gilles Detillieux wrote:
> > 4. After all the sites have been htdigged, I run htmerge in sequence in
> > order to merge all the small databases into one.
> > First call is "htmerge -c site1.conf", subsequents call are "htmerge -c
> > site1.conf -m site2.conf", "htmerge -c site1.conf -m site3.conf", (and 
> so on.)
>...
> > 2. Now let's hear the amazing part of my story. If I do a "htmerge -c
> > site5.conf" (notice there is no -m this time.) and if I htsearch -c
> > site5.conf with "rénovation tourisme" my document is said to be found !
> > Said in another way, the document was indexed but was certainly ripped out
> > when merging with another database.
>
>I think after each separate htdig -i -c site#.conf you should run a
>separate htmerge -c site#.conf, not just on the first site, before you
>merge everything together.  Try that and see if it solves the problem.
>I think the intention was that these extra merges should not have been
>necessary, but this has come up before, and I think there's a problem
>with merging multiple DBs when they haven't already been cleaned up by
>a simple htmerge.

I tried it and it didn't solve the problem. BTW, I don't think that these 
extra merges are necessary either.

Now, I run :
htmerge -c site#.conf
then
htmerge -c site1.conf -m site#.conf (with # > 1)

If I then run
htsearch -c site5.conf with words="rénovation tourisme", it finds the 
document (in first place.)
But if I do
htsearch -c site1.conf with the same words, it returns the "nomatch" document.

Some of the web hosts are case sensitives and some are not. Could it be the 
source of my problem ?

What are the rules for htmerge ? When does it really remove URLs from 
database ?

--
Olivier Korn
Strasbourg, France.


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to