Hi Peter,

The codes which are quite straightforward are below:

#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;

my $twig= new XML::Twig;
$twig->parsefile( "./content.example.txt"); #open XML file
my $root = $twig->root; #set root
chdir "F:/httpserv"; #set initial directory
foreach my $topic ($root->children('Topic')) {
if ($topic->children('link')){ #if element <link/> is a child of <Topic/>, change directory for index writing
chdir $topic->att('r:id');
foreach my $link ($topic->children('link')) {
foreach my $extpage ($root->children('ExternalPage')) {
if ($link->att('r:resource') eq $extpage->att('about')){
print $extpage->first_child_text('d:Title'), "\n";
print $extpage->first_child_text('d:Description'), "\n";
$twig->purge; #I'm not sure if I need to purge in each loop.
}
}
$twig->purge;
}
$twig->purge;
chdir "F:/httpserv"; #reset directory pointer to local root directory
}
}


While I think <Topic/> and <ExternalPage/> are not randomly intermixed as <Topic/> nodes are generated in relevant categories such as <Arts/> -> <Arts/Movie> -> <Arts/Movie/Title> and then if the <Topic/> has <link/> children which means it is a final category, then <ExternalPage/> nodes appeared immediatly below the <Topic/> with the same order as <link/>.

Thanks again,

Nan

From: Peter Rabbitson <[EMAIL PROTECTED]>
To: beginners@perl.org
Subject: Re: Errors on processing 2GB XML file by using XML:Simple
Date: Mon, 16 May 2005 08:25:03 -0500

> Basically, the XML file has two key parallelled nodes: <Topic/> and
> <ExternalPage/>. If there is a <link/> child existing in <Topic/>,
> <ExternalPage/> node will be existing for showing more detailed information
> about the content of this <link/> such as <d:Title/> and <d:Description/>.
>
> However, not every <Topic/> node has one or more <link/> child, so I need
> to write a loop to find out if <link/> is a child of <Topic/> nodes. If
> there are some <link/> nodes existing, I will check each of <ExternalPages>
> to output more information.


Can you provide some relevant code? Looking at the sample xml two handlers
immediately come to mind, one for RDF/Topic another for RDF/ExternalPage,
both calling different subroutines that use some kind of shared variables as
flags indicating which <Topic/> are we currently in, and which links are we
looking for when parsing through ExternalPages.


Unless both of your <Topic/> and <ExternalPages/> are randomly intermixed
(very very very unlikely) and if they are - then you are really screwed :)

Peter

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>





-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to