According to Bodo Bauer:
> Gilles Detillieux ([EMAIL PROTECTED]) wrote:
> > According to Bodo Bauer:
> > > I try to set up htdig for our website, to index our mailinglist
> > > archives. Unfortunatly it seems to ignore exaclty these links.
> > >
> > > The Arcives are stored in directories containing a colon (like 1999:Feb)
> > > for february 1999. If I start within such a subdir it works
> > >
> > > start_url: http://www.suse.com/Mailinglists/suse-informix/1999:Feb/
> > >
> > > but
> > >
> > > start_url: http://www.suse.com/Mailinglists/suse-informix
> > >
> > > doesn't see this subdir. The index file there however contians
> > > all the links...
> > >
> > > Any idea?
> >
> > It contains all the links, but the links are not complete. They're all
> > missing their closing </a> tag. htdig doesn't process <a href=...> tags
> > until it finds the closing </a> tag, so these are just getting ignored.
>
> Thanks a lot for finding this bug. How emmbarrising, could have seen this myself.
> I looked about a hunderd times on the HTTP code yesterday looking for some
> kind of error. I fixed the script generating these pages and now it works!
>
> Sorry for bothering you...
Not at all. It was one that was hard to spot, and htdig didn't give any
error messages to point the way. Here's a patch to htdig/HTML.cc that
should make it handle this situation better in the future...
--- htdig/HTML.cc.hrefunterm Wed Mar 17 11:01:08 1999
+++ htdig/HTML.cc Wed Mar 17 14:06:37 1999
@@ -465,6 +465,16 @@ HTML::do_tag(Retriever &retriever, Strin
q++;
*q = '\0';
}
+ if (in_ref)
+ {
+ if (debug > 1)
+ cout << "Terminating previous <a href=...> tag,"
+ << " which didn't have a closing </a> tag."
+ << endl;
+ if (dofollow)
+ retriever.got_href(*href, description);
+ in_ref = 0;
+ }
delete href;
href = new URL(position, *base);
in_ref = 1;
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.