Hi, I think there is a bug in htdig (3.0.8b2, Solaris 2.6). When it parses a document containing wrong HTML - in my example an unclosed comment - it stores the beginning content from the document parsed before. Of course, wrong HTML is a bad thing, but I think it should store no content (or a warning) instead of other content for this wrong page. Example: start_url: http://www.tu-chemnitz.de/~fri/htdigtest/t1.html It contains a link to .../t2.html with wrong HTML. The resulting db.docs is: 0 u:http://www.tu-chemnitz.de/~fri/htdigtest/t1.html t:Title 1 a:0 m:902509994 s:130 h: HEAD 1 Link to t2 some text l:902509999 L:1 I:130 d: A: 1 u:http://www.tu-chemnitz.de/~fri/htdigtest/t2.html t:Title 2 a:0 m:902509903 s:183 h: HEAD 1 Link to t2 some text l:902509999 L:0 I:183 d:Link to t2 A: ^^^^^^^^^^^^^^^^^^^^^^^ that's wrong! You see in the second entry for t2.html the content of t1.html. Does anyone has a fix or a suggestion where to look in the code? Thanks, - Frank -- Email: [EMAIL PROTECTED] http://www.tu-chemnitz.de/~fri/ Work: Computing Services, Technical University, 09107 Chemnitz, Germany ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
