On 23/02/2012 00:59, Webley Silvernail wrote:

I have an HTML page that is updated automatically each day. I am
using  HTML::TreeBuilder to create and insert the new content.

Most of the time, this works fine, but I've hit a snag when existing
text nodes on the page includes a gt or lt symbol.


For example, I might have an existing element on the page that looks
like this:

<td>&lt;B</td>

When the page is updated, depending on how I print the output, this
may cause problems.

Some techniques I use to print the output work OK for the new part
but affect the existing content adversely. Other techniques work well
with the existing content but cause problems with the new content.

Here are some of the output approaches I have tried:

I.

print OUT $root->as_HTML('', '', {});


Results: new content looks good, but the existing content is affected:

<td><B</td>     #The browser won't render this and generally just blanks out 
the text node.


II.
print OUT $root->as_HTML('<>&', '', {});

Results: existing content looks good; new content is output with all
of the< > in the HTML source encoded as entity references (i.e. raw
HTML is rendered by the browser).

III.
use Encode qw(encode decode);
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode('UTF-8',$string_rep);


Results: same as test II.

IV.
use HTML::Entities;
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode_entities($string_rep);


Results: Entire page is output with all of the< > in the HTML source
encoded as entity references (i.e. raw HTML is rendered by the browser).


V.
Various iterations of the above approaches using a subsequent call to
HTML Tidy to attempt to clean up the HTML.

Hey Webley

Approach II is the correct one. The problem is with the way you are
adding your new content, which is presumably as a text string (in which
case HTML::Element is correct to render it as text!).

The correct way is to build an HTML::Element tree with calls like

  my $tree = HTML::TreeBuilder->new_from_content($content);

  my $new = HTML::ELement->new('b');
  $new->push_content('This text in BOLD');

  my $place = $tree->look_down(_tag => 'div', id => 'insertion');
  $place->push_content($new);

all depending on what you want to insert and how you locate the place in
the document to insert it. The above will build content like

  <b>This text in BOLD</b>

and insert it under an element marked

  <div id="insertion">

An alternative is to pass your new string to HTML::Treebuilder to build
a new HTML fragment from your string and then insert that into your
document.

HTH,

Rob



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to