Hi,

We were using ht://Dig for many months now and we didn't have to complain 
about it but... There is something strange that I don't understand.

The way, we're using ht://Dig is described here :

1. We have 20 or so web sites named, say, http://www.site1.fr/a-path/, 
http://www.site2.fr/a-path-which-does-not-read-the-same-as-site1/, and so 
on. Some are MS-IIS, some are Linux/Apache hosted.

2. For each of these sites, I made up a site1.conf, site2.conf, (and so on) 
containing start_url, restrict thing, (and so on.) Each of these .conf 
includes a file named "_commun_include". Of course, I changed database 
prefix for each of the sites.

3. Once a week, htdig is called on each site with "htdig -i -c site1.conf" 
then "htdig -i -c site2.conf", (and so on.)

4. After all the sites have been htdigged, I run htmerge in sequence in 
order to merge all the small databases into one.
First call is "htmerge -c site1.conf", subsequents call are "htmerge -c 
site1.conf -m site2.conf", "htmerge -c site1.conf -m site3.conf", (and so on.)

5. Everything seems to work perfectly. Using htsearch, I can find documents 
which are on any of the sites. Let's note for later that my locale is 
correctly set so I don't have any problem with accents (I also use the 
accents patch which works fine.) (I say all this because of the example I 
give below.) ("htfuzzy accents" is run after all the htmerge.)


Here is the problem :

1. On site5, there is an HTML document named "Rénovation du BTS tourisme".
When searching for "rénovation tourisme" (method=and) the document is not 
found (ht://Dig even says there is no document containing these words.) 
Using the "restrict=http://www.site5.fr/site5-path-to-docs/" parameter 
doesn't change anything (this is not a surprise but... I wanted to be sure.)

2. Now let's hear the amazing part of my story. If I do a "htmerge -c 
site5.conf" (notice there is no -m this time.) and if I htsearch -c 
site5.conf with "rénovation tourisme" my document is said to be found ! 
Said in another way, the document was indexed but was certainly ripped out 
when merging with another database.


Well, I'd like to know if somebody already ran into this particular problem 
or if it is a "feature" of htmerge (deleting entry when merging two 
databases together.) What can I do against it ?

I'm really confused about all of this (this state of mind doesn't help me 
to write correct english. Sorry about that.)

--
Olivier Korn
Strasbourg, France.


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to