Re: [htdig] PATCH: untranslated entity presentation (3.2.x)

Gilles Detillieux Thu, 18 Oct 2001 13:16:24 -0700

According to Jamie Anstice:
> Here's a quickie that someone else might like to verify if they've 
> run into the same problem.  When htdig encounters an entity that it 
> doesn't know about (say &#146; - which should really be &#8217; but 
> that's another issue) it copies it verbatim to the extract - so far
> so good.  When the extract is sent out in Display::hilight, the
> extract is decoded with HtSGMLCodec to transform the unsigned char
> characters to entities, and as well as the characters above 160 it
> translates & to &amp;, which is fine except when & is the start of 
> an entity.  This is what leaves things like &amp;146; in extracts.
> Here's a patch to HtSGMLCodec::decode to make sure that it doesn't break
> real entities.


The problem with this is it doesn't make a distinction between an
"&" character in the excerpt that came from a "&" or from a "&amp;"
in the source document.  For example, if a document gives an example of
SGML encoding, and therefore contains something like "&amp;lt;" in the
source HTML document, it goes into the excerpt in the database as "&lt;".
With your patch, that &lt; in the excerpt doesn't get expanded back to
&amp;lt; in the resulting HTML output.  I guess it comes down to which
is the lesser of two evils, and probably your patch is the better choice.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] PATCH: untranslated entity presentation (3.2.x)

Reply via email to