Rob,
----- Original Message -----
From: "Rob Dixon" <[EMAIL PROTECTED]>
To: "Perl List" <beginners@perl.org>
Cc: "Mike Blezien" <[EMAIL PROTECTED]>
Sent: Sunday, July 15, 2007 7:49 PM
Subject: Re: Parsing large XML file
Mike Blezien wrote:
we need to parse some very large XML files, approx., 900-1000KB's filesize. A
sample of a typical XML file can be view here that would be parsed:
http://projects.thunder-rain.com/uploads/000001.xml
I was planning on using the XML::Twig module to do this, using the following
code snip to loop through each of the <product> .... </product> elements. Not
every single element is needed but most within each loop of each
<product></product>
# Code snip:
####################################################################
my $xmlfile = '/path/to/upload/000001.xml';
my $cgi = new CGI();
my $twig = new XML::Twig(twig_handlers => {
product => \&get_products,
});
$twig->parsefile("$xmlfile");
sub get_products {
my($t,$elt) = @_;
# loop through each product.
my $article_number = $elt->first_child_text('article_number');
my $ean_upc = $elt->first_child_text('ean_upc');
my $distributor_number = $elt->first_child_text('distributor_number');
my $distributor_name = $elt->first_child_text('distributor_name');
my $artist = $elt->first_child_text('artist');
# now loop through each
<tracks><number_of_tracks></number_of_tracks><playtime></playtime>
# <track> <sound> </sound> </track></tracks> for each product.
# <number_of_tracks> element determines total <tracks> .. <track> <sound>
</sound> </track> .. </tracks>
# # in loop.
$t->purge();
}
exit();
#################################################################
Now the areas I'm have alot of problem is with the elements within each
product, the
<tracks> .... </tracks> and looping through each of the tracks child
elements and <sound></sound>
---------
<product>
.......
<tracks>
<number_of_tracks></number_of_tracks><playtime></playtime>
<track> ....
<sound> ..
</sound>
</track>
</tracks>
........
</product>
--------
Is there a better way to do this to obtain all the data within each of the
<product> ... </product> elements? I've never really worked with XML files
this large and complex tree. Any help or suggestions would be much
appreciated.
Hi Mike
Your application of XML::Twig seems exactly right. I'm not sure what it is you
don't understand, but if you use this as your 'get_products' subroutine I hope
it answers some questions. All it does is print the title of the product and
the title of all the tracks in that product. Post again if you have any
trouble
understanding what I've written.
sub get_products {
my $product = $_;
my $product_title = $product->first_child('title');
print $product_title->trimmed_text, "\n";
my $tracks = $product->first_child('tracks');
return unless $tracks;
foreach my $track ($tracks->children('track')) {
my $track_title = $track->first_child('title');
print ' ', $track_title->trimmed_text, "\n";
}
print "\n";
}
HTH,
Rob
Ok, this helps getting me in the right direction, much appreciate the help.
The only question I have now, is while looping through the <track> </track> we
have another loop inside each for the <track>
.....
.....
<sound>
<file> ... </file>
<sound_type> ... </sound_type>
<codec> ... </codec>
<bitrate> ... </bitrate>
<channels>mono</channels>
</sound>
......
.......
</track>
can one do something like this:
foreach my $track ($tracks->children('track'))
{
for my $sound ($track->first_child('sound'))
{
my $soundtype = $sound->first_child_text('sound_type');
my $codec = $sound->first_child_text('codec');
}
my $track_title = $track->first_child('title');
print ' ', $track_title->trimmed_text, "\n";
}
Would this work or is there a better way to do this ?
Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/