According to Geoff Hutchison:
> On Friday, March 8, 2002, at 05:20  PM, Jim Cole wrote:
> > It does look like there is a problem with the parser. If a '<'
> > occurs in a script element, it appears that the parser becomes
> > somewhat confused with regard to the remaining document content.
> > For example
> 
> Yes, this sounds like a bug to me. Actually, the <script> sections and 
> probably other sections as well should be simply skipped by the parser. 
> Right now the code does this:
> 
> >         case 29:        // "script"
> >             noindex |= TAGscript;
> >             nofollow |= TAGscript;
> >             break;
> 
> In short, the parser doesn't *index* the bits inside <script></script> 
> tags, but it does *look* at them. So it hit that "<" character and 
> figured it was a new tag.
> 
> I would think that we want to treat <script> and probably <style> 
> sections like comments--find the ending tag and completely ignore 
> everything inside.

I think your assessment of the problem, and proposed solution, are
both bang-on.  The stuff between the <script> and </script> tag should
be stripped out entirely and not parsed for HTML tags.

Of course, you can avoid this problem in your HTML if you properly put
inline JavaScript code inside an HTML comment.  E.g.:

<script>
<!--

JavaScript code here

// -->
</script>

I'm amazed at how frequently people/programs fail to do this.  It's
what you're supposed to do to avoid problems with non-JavaScript-aware
web clients.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to