On Tue, 3 May 2005, Louis Ballezzi wrote:
I think I've traced the problem to the speed with which htdig downloads data while digging. When I monitor network activity, I find that htdig is only downloading data at speeds averaging around 20KB/s. My network should allow me to achieve speeds well in excess of that. Other processes running on the server achieve connection speeds in the hundreds of KB/s. Is there anyway for me to coax faster digging out of htdig so that it can utilize more of my bandwith?
Do you have a server_wait_time attribute set? This could significantly slow the dig since it adds a delay between each request.
I am not sure that the numbers you are seeing are necessarily a sign of a problem. The dig is doing a lot more than just pulling data across a network. Roughly speaking, the process involves pulling a generally small document from a remote site, parsing out metadata, parsing out text, storing this information in databases, parsing out any URL's in the document, determining which URL's have already been seen, which are to be discarded for other reasons, and which are to be requested at a later time. After all of this is done, it selects the next URL in the list and starts the process again.
With all of this going on the average speeds are not likely to approach those of applications that are just sucking up large amounts of data as fast as they can. Whether the numbers you are seeing are actually good or bad is hard to say because there are so many variables. Of the digs that I perform on a regular basis, a couple report an average transfer speed in the neighborhood of what you are seeing. A couple others that involve different servers and types of sites report significantly less.
Jim
------------------------------------------------------- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

