Template::Extract is really very shiny.  For people who haven't seen
it yet - it's kind of like Template Toolkit backwards.  You can use it
to make screenscraping code less ugly.

For people who *have* played with it, I have a question.  Here is my code:

----------------------------------------------------------------------
use strict;
use Template::Extract;

my $extractor = Template::Extract->new;

my $template = << '.';
[% FOREACH entry %]
[% ... %]
<div>[% FOREACH title %]<i>[% title_text %]</i>[% END %]<br>[% content %]</div>
  ([% FOREACH comment %]<b>[% comment_text %]</b> |[% END %]Comment on this)
[% END %]
.

my $document = << '.';
<div><i>Title 1</i><br>xxx</div>
  (<b>1 Comment</b> |Comment on this)
<div><i>Title 2</i><br>foo</div>
  (Comment on this)
.

my $data = $extractor->extract( $template, $document );
use Data::Dumper; print Dumper $data;
----------------------------------------------------------------------

and here is my output:

----------------------------------------------------------------------
$VAR1 = {
          'entry' => [
                       {
                         'comment' => [
                                        {
                                          'comment_text' => '1 Comment'
                                        }
                                      ],
                         'title' => [
                                      {
                                        'title_text' => 'Title 1'
                                      },
                                      {
                                        'title_text' => 'Title 2'
                                      }
                                    ],
                         'content' => 'xxx'
                       },
                       {
                         'content' => 'foo'
                       }
                     ]
        };
----------------------------------------------------------------------

Now, why is it putting both my titles into the first entry?  If I
change the template to remove the *third* FOREACH (ie, not the one
that's iterating over titles) like so:

----------------------------------------------------------------------
my $template = << '.';
[% FOREACH entry %]
[% ... %]
<div>[% FOREACH title %]<i>[% title_text %]</i>[% END %]<br>[% content %]</div>
  (Comment on this)
[% END %]
.

my $document = << '.';
<div><i>Title 1</i><br>xxx</div>
  (Comment on this)
<div><i>Title 2</i><br>foo</div>
  (Comment on this)
.
----------------------------------------------------------------------

then I get the output I expect:

----------------------------------------------------------------------
$VAR1 = { 
          'entry' => [ 
                       { 
                         'title' => [ 
                                      { 
                                        'title_text' => 'Title 1'
                                      }
                                    ],
                         'content' => 'xxx'
                       },
                       { 
                         'title' => [ 
                                      { 
                                        'title_text' => 'Title 2'
                                      }
                                    ],
                         'content' => 'foo'
                       }
                     ]
        };
----------------------------------------------------------------------

Any ideas?  I need to iterate over title and comments since some entries
don't have one or the other or both.

Kake

Reply via email to