Andrew Gaffney wrote:
Andrew Gaffney wrote:

Randy W. Sims wrote:

On 7/21/2004 11:24 PM, Andrew Gaffney wrote:

Randy W. Sims wrote:

On 7/21/2004 10:42 PM, Andrew Gaffney wrote:

I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:

my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea



[snip]

I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?


Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:


<SNIP>

Thanks. In the time it took you to put that together, I came up with the following to figure out how HTML::Parser works. I'll use your code to expand upon it.


<SNIP>

Here is my current working code. Please take a look at it and see if there are any obvious (or not so obvious) problems. I thought this would end up being far more difficult.

parsehtml.pl
============
#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser ();

my $htmltree = [ { tag => 'document', content => [] } ];
my $node = $htmltree->[0]->{content};
my @prevnodes = ($htmltree);

sub start {
  my $tagname = shift;
  my $attr = shift;
  my $newnode = {};

  $newnode->{tag} = $tagname;
  foreach my $key(keys %{$attr}) {
    $newnode->{$key} = $attr->{$key};
  }
  $newnode->{content} = [];
  push @prevnodes, $node;
  push @{$node}, $newnode;
  $node = $newnode->{content};
}

sub end {
  my $tagname = shift;

  $node = pop @prevnodes;
}

sub text {
  my $text = shift;

  chomp $text;
  if($text ne '') {
    push @{$node}, $text;
  }
}

my $p = HTML::Parser->new( api_version => 3,
                           start_h => [\&start, "tagname, attr"],
                           end_h   => [\&end,   "tagname"],
                           text_h  => [\&text,  "dtext"] );

$p->parse_file("test.html");

use Data::Dumper;
print Dumper $htmltree;

test.html
=========
<table id="maintable" width="300">
<tr>
<td width="200">some content</td>
<td width="100">more content</td>
</tr>
</table>

Now for the next challenge. I need to be able to know where I am in the tree structure for any node that I am in while I am walking it. I will pass along a value via CGI in the form of '0.0.2.1.2' which another script will translate as '$htmltree->[0]->{content}->[0]->{content}->[2]->{content}->[1]->{content}->[2]'. Using the above code, and the following code I wrote for walking the tree and generating HTML from it, how can I mark each outputted HTML tag with its position in the tree?


sub descend_htmltree {
  my $node = shift;
  my $withclickiness = shift || 0;

foreach my $tmpnode (@{$node}) {
if(ref($tmpnode) eq 'HASH') {
my $nodeid = ""; # Magic code to generate node's position in tree
$htmloutput .= "<div style='border: thin solid #bbbbbb' onDblClick=\"alert('you clicked $nodeid')\">" if($withclickiness);
$htmloutput .= "<$tmpnode->{tag}";
foreach(keys %{$tmpnode}) {
$htmloutput .= " $_=\"$tmpnode->{$_}\"" if($_ ne 'tag' && $_ ne 'content');
}
$htmloutput .= ">";
descend_htmltree($tmpnode->{content});
$htmloutput .= "</$tmpnode->{tag}>";
$htmloutput .= "</div>" if($withclickiness);
} else {
$htmloutput .= "$tmpnode";
}
}
}


sub htmltree_to_html {
  my $filename = shift || '';
  my $withclickiness = shift || 0;

  descend_htmltree($htmltree->[0]->{content}, $withclickiness);
  if($filename ne '') {
    open HTML, "> $filename" or die "Can't open $filename for HTML output";
    print HTML $htmloutput;
    close HTML;
  }

  return $htmloutput;
}

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to