Hi, Last night I tried out some patches from Randy to support the libghttp HTTP/1.1 implementation. I was hoping that they'd greatly speed up running htdig since there could be one persistent connection to the server. I was disappointed. Don't get me wrong, I was very impressed looking at the server logs as it dug. It was transferring several hundred K per second at the beginning of the dig and seemed to be running faster than the HTTP/1.0 version. Cool. But after an hour or so, the hits were coming slower and slower. It hit me--digging is *NOT* O(n) where n is the number of documents. Since it has to do a lookup on each URL to see if it's unique, it's maybe something like O(n log n). :-( The biggest problem is that "log n" term is the speed of a database lookup for exists(). Since the document database gets big, that factor has a pretty big constant. So a move towards the structure I outlined earlier, storing docs by DocID and using a URL->DocID list should speed things up in htdig too. After all the URL->DocID list should be smaller, so a lookup will be faster, right? Thoughts? -Geoff ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the SUBJECT of the message.
