I have an HTML page that is updated automatically each day. I am using HTML::TreeBuilder to create and insert the new content.
Most of the time, this works fine, but I've hit a snag when existing text nodes on the page includes a gt or lt symbol. For example, I might have an existing element on the page that looks like this: <td><B</td> When the page is updated, depending on how I print the output, this may cause problems. Some techniques I use to print the output work OK for the new part but affect the existing content adversely. Other techniques work well with the existing content but cause problems with the new content. Here are some of the output approaches I have tried: I. print OUT $root->as_HTML('', '', {}); Results: new content looks good, but the existing content is affected: <td><B</td> #The browser won't render this and generally just blanks out the text node. II. print OUT $root->as_HTML('<>&', '', {}); Results: existing content looks good; new content is output with all of the < > in the HTML source encoded as entity references (i.e. raw HTML is rendered by the browser). III. use Encode qw(encode decode); ... my $string_rep = $root->as_HTML('<>&', '', {}); print OUT encode('UTF-8',$string_rep); Results: same as test II. IV. use HTML::Entities; ... my $string_rep = $root->as_HTML('<>&', '', {}); print OUT encode_entities($string_rep); Results: Entire page is output with all of the < > in the HTML source encoded as entity references (i.e. raw HTML is rendered by the browser). V. Various iterations of the above approaches using a subsequent call to HTML Tidy to attempt to clean up the HTML. Any ideas appreciated. Thanks, Webley -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/