According to Gilles Detillieux:
> OK, I found a bug in htdig/HTML.cc, which I think causes it to think the
> "&" isn't translated to "&", so it copies the whole entity through.
> I'll see if I can find a sensible fix.  Thanks for persisting.

Could you please give this patch a try and tell me if it fixes this
problem, without breaking anything else?  It should do for 3.2.0b1,
3.2.0b2, and any recent 3.2.0bx snapshot.

--- htdig/HTML.cc.orig  Wed May 24 07:42:43 2000
+++ htdig/HTML.cc       Fri Aug 11 13:41:32 2000
@@ -259,8 +259,8 @@ HTML::parse(Retriever &retriever, URL &b
                scratch = 0;
                scratch.append((char*)position, q+1 - position);
                textified = HtSGMLCodec::instance()->encode(scratch);
-               if (textified[0] != '&')        // it was decoded, copy it
-                 {
+               if (textified[0] != '&' || textified.length() == 1)
+                 {     // it was decoded, copy it
                    position = (unsigned char *)textified.get();
                    while (*position)
                      {


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to