Re: possible bug in HTML::Parser comment handler

2001-01-12 Thread Gisle Aas

Dave <[EMAIL PROTECTED]> writes:

> I have solved the problem I was having thanks to the info here.  All I had
> to do was pass is_cdata as an arg to the handler and only print if it was
> false.

The downside of this approach is that the content of  elements is
not printed.  But this tag is anyway not official HTML any more.

Regards,
Gisle



Re: possible bug in HTML::Parser comment handler

2001-01-12 Thread Gisle Aas

Bjoern Hoehrmann <[EMAIL PROTECTED]> writes:

> "Although the STYLE and SCRIPT elements use CDATA for their data
> model, for these elements, CDATA must be handled differently by user
> agents. Markup and entities must be treated as raw text and passed to
> the application as is. The first occurrence of the character sequence
> " element's content. In valid documents, this would be the end tag for
> the element."

Note that HTML::Parser does in fact allow "
  print "Hello\n";
  print "Bla, bla,";
   

To make this correct the first print statement has to be written
something like:

  print "Hello<" . "/h1\n";

Regards,
Gisle



Re: possible bug in HTML::Parser comment handler

2001-01-12 Thread Gisle Aas

"Sean M. Burke" <[EMAIL PROTECTED]> writes:

> At 11:21 PM 2001-01-11 +0100, Bjoern Hoehrmann wrote:
> >At 15:28 11.01.01 -0500, you wrote:
> >>It seems that the parser is not properly detecting multi-line HTML
> >>comments.  I was trying to print out the dtext of a html document and
> >>noticed that comments kept showing up in the output.  Upon further
> >>examination, the single line comments were being ignored but ones like
> >>this:
> >>
> >>
> >
> >Well, the content model of the style element is CDATA, your "comments"
> >may look like comments but they are no comments in HTML and SGML
> >terms. That's not a bug.
> 
> I don't see what's wrong with that comment.

>From the shape of the text we can guess that the original poster has
left out the fact that the context for this "comment" was a 
element.  The fact that he says that comment handlers do not work is
also an indication of this.

This is probably what he parsed: