Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:
my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea
[snip]
I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:
<SNIP>
Thanks. In the time it took you to put that together, I came up with the following to figure out how HTML::Parser works. I'll use your code to expand upon it.
<SNIP>
Here is my current working code. Please take a look at it and see if there are any obvious (or not so obvious) problems. I thought this would end up being far more difficult.
parsehtml.pl ============ #!/usr/bin/perl
use strict; use warnings;
use HTML::Parser ();
my $htmltree = [ { tag => 'document', content => [] } ]; my $node = $htmltree->[0]->{content}; my @prevnodes = ($htmltree);
sub start { my $tagname = shift; my $attr = shift; my $newnode = {};
$newnode->{tag} = $tagname; foreach my $key(keys %{$attr}) { $newnode->{$key} = $attr->{$key}; } $newnode->{content} = []; push @prevnodes, $node; push @{$node}, $newnode; $node = $newnode->{content}; }
sub end { my $tagname = shift;
$node = pop @prevnodes; }
sub text { my $text = shift;
chomp $text; if($text ne '') { push @{$node}, $text; } }
my $p = HTML::Parser->new( api_version => 3, start_h => [\&start, "tagname, attr"], end_h => [\&end, "tagname"], text_h => [\&text, "dtext"] );
$p->parse_file("test.html");
use Data::Dumper; print Dumper $htmltree;
test.html ========= <table id="maintable" width="300"> <tr> <td width="200">some content</td> <td width="100">more content</td> </tr> </table>
Now for the next challenge. I need to be able to know where I am in the tree structure for any node that I am in while I am walking it. I will pass along a value via CGI in the form of '0.0.2.1.2' which another script will translate as '$htmltree->[0]->{content}->[0]->{content}->[2]->{content}->[1]->{content}->[2]'. Using the above code, and the following code I wrote for walking the tree and generating HTML from it, how can I mark each outputted HTML tag with its position in the tree?
sub descend_htmltree { my $node = shift; my $withclickiness = shift || 0;
foreach my $tmpnode (@{$node}) {
if(ref($tmpnode) eq 'HASH') {
my $nodeid = ""; # Magic code to generate node's position in tree
$htmloutput .= "<div style='border: thin solid #bbbbbb' onDblClick=\"alert('you clicked $nodeid')\">" if($withclickiness);
$htmloutput .= "<$tmpnode->{tag}";
foreach(keys %{$tmpnode}) {
$htmloutput .= " $_=\"$tmpnode->{$_}\"" if($_ ne 'tag' && $_ ne 'content');
}
$htmloutput .= ">";
descend_htmltree($tmpnode->{content});
$htmloutput .= "</$tmpnode->{tag}>";
$htmloutput .= "</div>" if($withclickiness);
} else {
$htmloutput .= "$tmpnode";
}
}
}
sub htmltree_to_html { my $filename = shift || ''; my $withclickiness = shift || 0;
descend_htmltree($htmltree->[0]->{content}, $withclickiness); if($filename ne '') { open HTML, "> $filename" or die "Can't open $filename for HTML output"; print HTML $htmloutput; close HTML; }
return $htmloutput; }
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>