I have an HTML page that is updated automatically each day. I am using 
HTML::TreeBuilder to create and insert the new content.

Most of the time, this works fine, but I've hit a snag when existing text nodes 
on the page includes a gt or lt symbol.


For example, I might have an existing element on the page that looks like this:

<td>&lt;B</td>

When the page is updated, depending on how I print the output, this may cause 
problems.

Some techniques I use to print the output work OK for the new part but affect 
the existing content adversely. Other techniques work well with the existing 
content but cause problems with the new content.

Here are some of the output approaches I have tried:

I.

print OUT $root->as_HTML('', '', {});


Results: new content looks good, but the existing content is affected:

<td><B</td>    #The browser won't render this and generally just blanks out the 
text node.


II.
print OUT $root->as_HTML('<>&', '', {});

Results: existing content looks good; new content is output with all of the < > 
in the HTML source encoded as entity references (i.e. raw HTML is rendered by 
the browser).

III.
use Encode qw(encode decode);
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode('UTF-8',$string_rep);


Results: same as test II.

IV.
use HTML::Entities;
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode_entities($string_rep);


Results: Entire page is output with all of the < > in the HTML source encoded 
as entity references (i.e. raw HTML is rendered by the browser).


V.
Various iterations of the above approaches using a subsequent call to HTML Tidy 
to attempt to clean up the HTML.


Any ideas appreciated.
Thanks,
Webley

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to