Template::Extract is really very shiny. For people who haven't seen it yet - it's kind of like Template Toolkit backwards. You can use it to make screenscraping code less ugly.
For people who *have* played with it, I have a question. Here is my code: ---------------------------------------------------------------------- use strict; use Template::Extract; my $extractor = Template::Extract->new; my $template = << '.'; [% FOREACH entry %] [% ... %] <div>[% FOREACH title %]<i>[% title_text %]</i>[% END %]<br>[% content %]</div> ([% FOREACH comment %]<b>[% comment_text %]</b> |[% END %]Comment on this) [% END %] . my $document = << '.'; <div><i>Title 1</i><br>xxx</div> (<b>1 Comment</b> |Comment on this) <div><i>Title 2</i><br>foo</div> (Comment on this) . my $data = $extractor->extract( $template, $document ); use Data::Dumper; print Dumper $data; ---------------------------------------------------------------------- and here is my output: ---------------------------------------------------------------------- $VAR1 = { 'entry' => [ { 'comment' => [ { 'comment_text' => '1 Comment' } ], 'title' => [ { 'title_text' => 'Title 1' }, { 'title_text' => 'Title 2' } ], 'content' => 'xxx' }, { 'content' => 'foo' } ] }; ---------------------------------------------------------------------- Now, why is it putting both my titles into the first entry? If I change the template to remove the *third* FOREACH (ie, not the one that's iterating over titles) like so: ---------------------------------------------------------------------- my $template = << '.'; [% FOREACH entry %] [% ... %] <div>[% FOREACH title %]<i>[% title_text %]</i>[% END %]<br>[% content %]</div> (Comment on this) [% END %] . my $document = << '.'; <div><i>Title 1</i><br>xxx</div> (Comment on this) <div><i>Title 2</i><br>foo</div> (Comment on this) . ---------------------------------------------------------------------- then I get the output I expect: ---------------------------------------------------------------------- $VAR1 = { 'entry' => [ { 'title' => [ { 'title_text' => 'Title 1' } ], 'content' => 'xxx' }, { 'title' => [ { 'title_text' => 'Title 2' } ], 'content' => 'foo' } ] }; ---------------------------------------------------------------------- Any ideas? I need to iterate over title and comments since some entries don't have one or the other or both. Kake