On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:
my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea
[snip]
I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:
#!/usr/bin/perl package SampleParser;
use strict;
use HTML::Parser; use base qw(HTML::Parser);
sub start { my($self, $tagname, $attr, $attrseq, $origtext) = @_; my $stack = $self->{_stack}; my $depth = $stack ? @$stack : 0; print ' ' x $depth, "<$tagname>\n"; push @{$self->{_stack}}, ' '; }
sub end { my($self, $tagname, $origtext) = @_; pop @{$self->{_stack}}; my $stack = $self->{_stack}; my $depth = $stack ? @$stack : 0; print ' ' x $depth, "<\\$tagname>\n"; }
1;
package main;
use strict; use warnings;
my $p = SampleParser->new(); $p->parse_file(\*DATA);
__DATA__ <html> <head> <title>Title</title> <body> The body. </body> </html>
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>