On 23/02/2012 00:59, Webley Silvernail wrote:
>>
>> I have an HTML page that is updated automatically each day. I am
>> using HTML::TreeBuilder to create and insert the new content.
>>
>> Most of the time, this works fine, but I've hit a snag when existing
>> text nodes on the page includes a gt or lt symbol.
>>
>>
>> For example, I might have an existing element on the page that looks
>> like this:
>>
>> <td><B</td>
>>
>> When the page is updated, depending on how I print the output, this
>> may cause problems.
>>
>> Some techniques I use to print the output work OK for the new part
>> but affect the existing content adversely. Other techniques work well
>> with the existing content but cause problems with the new content.
>>
>> Here are some of the output approaches I have tried:
>>
>> I.
>>
>> print OUT $root->as_HTML('', '', {});
>>
>>
>> Results: new content looks good, but the existing content is affected:
>>
>> <td><B</td> #The browser won't render this and generally just blanks out
>> the text node.
>>
>>
>> II.
>> print OUT $root->as_HTML('<>&', '', {});
>>
>> Results: existing content looks good; new content is output with all
>> of the< > in the HTML source encoded as entity references (i.e. raw
>> HTML is rendered by the browser).
>>
>> III.
>> use Encode qw(encode decode);
>> ...
>> my $string_rep = $root->as_HTML('<>&', '', {});
> print OUT encode('UTF-8',$string_rep);
>>
>>
>> Results: same as test II.
>>
>> IV.
>> use HTML::Entities;
>> ...
>> my $string_rep = $root->as_HTML('<>&', '', {});
>> print OUT encode_entities($string_rep);
>>
>>
>> Results: Entire page is output with all of the< > in the HTML source
>> encoded as entity references (i.e. raw HTML is rendered by the browser).
>>
>>
>> V.
>> Various iterations of the above approaches using a subsequent call to
>> HTML Tidy to attempt to clean up the HTML.
>Hey Webley
>
>Approach II is the correct one. The problem is with the way you are
>adding your new content, which is presumably as a text string (in which
>case HTML::Element is correct to render it as text!).
>
>The correct way is to build an HTML::Element tree with calls like
>
> my $tree = HTML::TreeBuilder->new_from_content($content);
>
> my $new = HTML::ELement->new('b');
> $new->push_content('This text in BOLD');
>
> my $place = $tree->look_down(_tag => 'div', id => 'insertion');
> $place->push_content($new);
>
>all depending on what you want to insert and how you locate the place in
>the document to insert it. The above will build content like
>
> <b>This text in BOLD</b>
>
>and insert it under an element marked
>
> <div id="insertion">
>
>An alternative is to pass your new string to HTML::Treebuilder to build
>a new HTML fragment from your string and then insert that into your
>document.
>
>HTH,
>
>Rob
Hi, Rob -
Thanks for the response. I *think* I'm already inserting my content in the way
you describe, but perhaps I am not. I should have been less generic in my
original message.
My
script connects to a database to retrieve the current day's updates. It
uses these results to update the HTML page either with a table
summarizing the new data or a message indicating that no new records
were added.
I am using HTML::Element->new() to create new
elements and then using either push_content() or unshift_content() to
insert the new content back into my tree object.
Here's my tree object:
my $root = HTML::TreeBuilder->new; # Is new_from_content different? It
doesn't seem so from Perldoc, but I could be wrong.
And a fragment within the tree:
my $content = $root->look_down('id', 'fmsbody');
Here's an example of new content being inserted:
my $div_date = HTML::Element->new('div','class'=>'date');
$content->unshift_content($div_date);
The
table is being constructed from a couple of subroutines. One creates
the header row, the other cycles through the resultset to create the
data rows.
Here's the part where the table is created:
my $scn7_table = HTML::Element->new('table', 'class'=>'fmtable');
my @scn7_col_heads = qw(EX FC TYPE DESCRIPTION);
my $scn7_table_head = create_heading_row(\@scn7_col_heads);
$scn7_table->push_content($scn7_table_head);
And here's the subroutine to insert the detail row which is where the only bad
content will wind up:
sub create_detail_row {
my $data = shift;
my @recs = @$data;
my $row = HTML::Element->new('tr');
my $cell = HTML::Element->new('td', 'class'=>'fmdata');
for (my $i=0; $i <= $#recs; $i++) {
$row->push_content($cell->starttag().$recs[$i].$cell->endtag());
}
return $row;
}
It is called like this:
#Create detail rows and insert into table
while (my @recs = $sth7->fetchrow_array) {
my $row = create_detail_row(\@recs);
$scn7_table->push_content($row);
}
#Insert table into document tree
$div_date->push_content($scn7_table);
Since
the detail rows are where the problem lies, maybe that's the spot I need
to check? The
$row->push_content($cell->starttag().$recs[$i].$cell->endtag());
part? I was thinking that by the time I get to the print OUT
$root->as_HTML part, everything should be good to go, but alas that
is not the case.
The problem data is a 2-char code that can be
any combination of the alphanumerics plus a handful of special
characters. In my original message, I used '<B' as an example, but
that could also have been '<3' or '<%' or '<I'. I guess HTML rendering agents
would be more prone to choking on <B, <I, <A, etc., though.
Thanks again,
Webley
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/