On Mon, 13 Jun 2005, Edward Chase wrote:
I'm hosting a site, http://www.op-stjoseph.org/. htdig had been indexing
this site for a number of years on a machine here. This machine is indexing
277 documents.
htdig: 1 server seen:
htdig: www.op-stjoseph.org:80 277 documents
I've been working on migrating indexing to a new machine. The new machine
sees 50 documents.
htdig: 1 server seen:
htdig: www.op-stjoseph.org:80 50 documents
The configs for the new machine are copied from the old machine.
Do you know whether the old machine is configured to index from scratch
each time? If it is not deleting the existing databases before indexing,
it might be that there are a number of documents that are no longer
linked from the starting URL but that still exist and have never been
purged from the database. In such a case I think the old machine would
continue to update the pages while the newer setup would of course never
see those pages.
Now just to make things a little stranger... I setup a machine at home to
look at this site. The home machine only saw 3 documents.
You might want to try indexing with increased verbosity (e.g. -vv or
-vvv). At a high enough level you should be able to trace which links
are being rejected (and why) and which followed. Comparing the output
for the three different cases should tell you something about why you
are seeing the difference in the number of documents indexed.
Jim
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general