[htdig3-dev] Speed of htdig

Geoff Hutchison Thu, 18 Feb 1999 10:09:20 -0500


Hi,

Last night I tried out some patches from Randy to support the libghttp
HTTP/1.1 implementation. I was hoping that they'd greatly speed up running
htdig since there could be one persistent connection to the server.

I was disappointed.

Don't get me wrong, I was very impressed looking at the server logs as it
dug. It was transferring several hundred K per second at the beginning of
the dig and seemed to be running faster than the HTTP/1.0 version. Cool.

But after an hour or so, the hits were coming slower and slower. It hit
me--digging is *NOT* O(n) where n is the number of documents. Since it has
to do a lookup on each URL to see if it's unique, it's maybe something
like O(n log n). :-(

The biggest problem is that "log n" term is the speed of a database lookup
for exists(). Since the document database gets big, that factor has a
pretty big constant. So a move towards the structure I outlined earlier,
storing docs by DocID and using a URL->DocID list should speed things up
in htdig too. After all the URL->DocID list should be smaller, so a lookup
will be faster, right?

Thoughts?

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig3-dev] Speed of htdig

Reply via email to