Rob,

----- Original Message ----- From: "Rob Dixon" <[EMAIL PROTECTED]>
To: "Perl List" <beginners@perl.org>
Cc: "Mike Blezien" <[EMAIL PROTECTED]>
Sent: Sunday, July 15, 2007 7:49 PM
Subject: Re: Parsing large XML file


Mike Blezien wrote:

we need to parse some very large XML files, approx., 900-1000KB's filesize. A sample of a typical XML file can be view here that would be parsed: http://projects.thunder-rain.com/uploads/000001.xml

I was planning on using the XML::Twig module to do this, using the following code snip to loop through each of the <product> .... </product> elements. Not every single element is needed but most within each loop of each <product></product>

# Code snip:
####################################################################
my $xmlfile = '/path/to/upload/000001.xml';
my $cgi     = new CGI();
my $twig = new XML::Twig(twig_handlers => {
                                          product => \&get_products,
                                         });
$twig->parsefile("$xmlfile");

sub get_products {
my($t,$elt) = @_;
# loop through each product.

 my $article_number     = $elt->first_child_text('article_number');
 my $ean_upc            = $elt->first_child_text('ean_upc');
 my $distributor_number = $elt->first_child_text('distributor_number');
 my $distributor_name   = $elt->first_child_text('distributor_name');
 my $artist             = $elt->first_child_text('artist');

# now loop through each <tracks><number_of_tracks></number_of_tracks><playtime></playtime>
   # <track> <sound> </sound> </track></tracks> for each product.
# <number_of_tracks> element determines total <tracks> .. <track> <sound> </sound> </track> .. </tracks>
#  # in loop.

$t->purge();
}

exit();
#################################################################

Now the areas I'm have alot of problem is with the elements within each product, the <tracks> .... </tracks> and looping through each of the tracks child elements and <sound></sound>
---------
<product>
.......
<tracks>
<number_of_tracks></number_of_tracks><playtime></playtime>
  <track> ....
     <sound> ..
     </sound>
  </track>
</tracks>
........
</product>
--------

Is there a better way to do this to obtain all the data within each of the <product> ... </product> elements? I've never really worked with XML files this large and complex tree. Any help or suggestions would be much appreciated.

Hi Mike

Your application of XML::Twig seems exactly right. I'm not sure what it is you
don't understand, but if you use this as your 'get_products' subroutine I hope
it answers some questions. All it does is print the title of the product and
the title of all the tracks in that product. Post again if you have any trouble
understanding what I've written.

 sub get_products {

   my $product = $_;

   my $product_title = $product->first_child('title');
   print $product_title->trimmed_text, "\n";

   my $tracks = $product->first_child('tracks');
   return unless $tracks;

   foreach my $track ($tracks->children('track')) {
     my $track_title = $track->first_child('title');
     print '  ', $track_title->trimmed_text, "\n";
   }

   print "\n";
 }

HTH,

Rob

Ok, this helps getting me in the right direction, much appreciate the help.

The only question I have now, is while looping through the <track> </track> we have another loop inside each for the <track>
.....
.....
  <sound>
    <file> ... </file>
    <sound_type> ... </sound_type>
    <codec> ... </codec>
    <bitrate> ... </bitrate>
    <channels>mono</channels>
</sound>
......
.......
</track>
can one do something like this:

foreach my $track ($tracks->children('track'))
{
 for my $sound ($track->first_child('sound'))
   {
      my $soundtype =  $sound->first_child_text('sound_type');
      my $codec       =  $sound->first_child_text('codec');
   }
  my $track_title = $track->first_child('title');
 print '  ', $track_title->trimmed_text, "\n";
}

Would this work or is there a better way to do this ?

Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to