Rob,

----- Original Message ----- From: "Rob Dixon" <[EMAIL PROTECTED]>
To: "Perl List" <beginners@perl.org>
Cc: "Mike Blezien" <[EMAIL PROTECTED]>
Sent: Sunday, July 15, 2007 7:49 PM
Subject: Re: Parsing large XML file


Mike Blezien wrote:

we need to parse some very large XML files, approx., 900-1000KB's filesize. A sample of a typical XML file can be view here that would be parsed: http://projects.thunder-rain.com/uploads/000001.xml

I was planning on using the XML::Twig module to do this, using the following code snip to loop through each of the <product> .... </product> elements. Not every single element is needed but most within each loop of each <product></product>

# Code snip:
####################################################################
my $xmlfile = '/path/to/upload/000001.xml';
my $cgi     = new CGI();
my $twig = new XML::Twig(twig_handlers => {
                                          product => \&get_products,
                                         });
$twig->parsefile("$xmlfile");

sub get_products {
my($t,$elt) = @_;
# loop through each product.

 my $article_number     = $elt->first_child_text('article_number');
 my $ean_upc            = $elt->first_child_text('ean_upc');
 my $distributor_number = $elt->first_child_text('distributor_number');
 my $distributor_name   = $elt->first_child_text('distributor_name');
 my $artist             = $elt->first_child_text('artist');

# now loop through each <tracks><number_of_tracks></number_of_tracks><playtime></playtime>
   # <track> <sound> </sound> </track></tracks> for each product.
# <number_of_tracks> element determines total <tracks> .. <track> <sound> </sound> </track> .. </tracks>
#  # in loop.

$t->purge();
}

exit();
#################################################################

Now the areas I'm have alot of problem is with the elements within each product, the <tracks> .... </tracks> and looping through each of the tracks child elements and <sound></sound>
---------
<product>
.......
<tracks>
<number_of_tracks></number_of_tracks><playtime></playtime>
  <track> ....
     <sound> ..
     </sound>
  </track>
</tracks>
........
</product>
--------

Is there a better way to do this to obtain all the data within each of the <product> ... </product> elements? I've never really worked with XML files this large and complex tree. Any help or suggestions would be much appreciated.

Hi Mike

Your application of XML::Twig seems exactly right. I'm not sure what it is you
don't understand, but if you use this as your 'get_products' subroutine I hope
it answers some questions. All it does is print the title of the product and
the title of all the tracks in that product. Post again if you have any trouble
understanding what I've written.

 sub get_products {

   my $product = $_;

   my $product_title = $product->first_child('title');
   print $product_title->trimmed_text, "\n";

   my $tracks = $product->first_child('tracks');
   return unless $tracks;

   foreach my $track ($tracks->children('track')) {
     my $track_title = $track->first_child('title');
     print '  ', $track_title->trimmed_text, "\n";
   }

   print "\n";
 }

we've run a few test and everything seems to be working as expected, but got one little problem I haven't been able to figure out, we keep getting this error (code snipt below)
----
Can't call method "first_child_text" on an undefined value
at .. /sample.cgi line 56 which is this line "my $tracknums = $tracks->first_child_text('number_of_tracks');
----
a value for the "$tracknums" is returned and all other values are presented after it parses the XML file. Haven't been able to figure out why I keep getting this error??

############################################################################
my $twig    = new XML::Twig(twig_handlers => { product => \&get_products });
  $twig->parsefile("$xmlfile"); $twig->purge();
############################################################################
sub get_products {
my($t,$elt) = @_;
my($track_title,$trackno,$setno,$soundtype,$codec,$file);

# process each product loop.
 my $article_number = $elt->first_child_text('article_number');
 my $dist_number    = $elt->first_child_text('distributor_number');
 my $dist_name      = $elt->first_child_text('distributor_name');
 my $artist         = $elt->first_child_text('artist');
 my $ean_upc        = $elt->first_child_text('ean_upc');
 my $set_total      = $elt->first_child_text('set_total');

 my $tracks    = $elt->first_child('tracks');
 # LINE 56 here
 my $tracknums = $tracks->first_child_text('number_of_tracks');

 return unless $tracks;

 for my $track ($tracks->children('track'))
  {
    $track_title = $track->first_child_text('title');
    $trackno     = $track->first_child_text('trackno');
    $setno       = $track->first_child_text('setno');

   for my $sound ($track->children('sound'))
    {
      $soundtype =  $sound->first_child_text('sound_type');
      $codec     =  $sound->first_child_text('codec');
      $file      =  $sound->first_child_text('file');~;
    }

  } # close for $track loop
# free up memory
$t->purge();
}

Mike

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to