Again, me stupid,

After tracing the code, it seems that htdig is allright.
The pages it was indexing are HTML versions of the java
API documentation and these have *a lot* of <A NAME=...>
tags in them.

So htdig needs *a lot* of time to go trough lists etc.
notably around line 346 of htcommon/DocumentRef.cc

    addlist(DOC_ANCHORS, s, docAnchors);

Which brings me to a question:
Is there really a usefull function performed by these tags
(for use in a search engine, that is) 


> Hmph. Sounds like there are some bugs to squash in the connection 
> code. Can you find the connection for that particular document in the 
> server log? Was the server heavily loaded at that point?
> 
> Gabriele and I are in the middle of a higher-level rewrite 
> (HtHTTP/Transport), but perhaps we want to revisit all the networking 
> code. Loic's suggestion on a test suite would help, but I'd be at a 
> bit of a loss for the base cases. Would we need to write/copy a TCP 
> sniffer, or am I missing something?
> 
> Any suggestions? Should we break the networking code out into a 
> separate shared library (htnet)?

--jesse
--------------------------------------------------------------------
J. op den Brouw                           Johanna Westerdijkplein 75
Haagse Hogeschool                                  2521 EN  DEN HAAG
Sector Techniek                                          Netherlands
Afdeling Elektrotechniek                              +31 70 4458936
-------------------- [EMAIL PROTECTED] --------------------

Linux - because reboots are for hardware changes


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to