Mike Blezien wrote:

Rob Dixon wrote:

Mike Blezien wrote:

we need to parse some very large XML files, approx., 900-1000KB's filesize. A sample of a typical XML file can be view here that would be parsed: http://projects.thunder-rain.com/uploads/000001.xml

I was planning on using the XML::Twig module to do this, using the following code snip to loop through each of the <product> .... </product> elements. Not every single element is needed but most within each loop of each <product></product>

[snip old code]

Hi Mike

Your application of XML::Twig seems exactly right. I'm not sure
what it is you don't understand, but if you use this as your
'get_products' subroutine I hope it answers some questions. All it
does is print the title of the product and the title of all the
tracks in that product. Post again if you have any trouble understanding what I've written.

 sub get_products {

   my $product = $_;

   my $product_title = $product->first_child('title');
   print $product_title->trimmed_text, "\n";

   my $tracks = $product->first_child('tracks');
   return unless $tracks;

   foreach my $track ($tracks->children('track')) {
     my $track_title = $track->first_child('title');
     print '  ', $track_title->trimmed_text, "\n";
   }

   print "\n";
 }

we've run a few test and everything seems to be working as expected,
but got one little problem I haven't been able to figure out, we keep
getting this error (code snipt below)
----
Can't call method "first_child_text" on an undefined value
at .. /sample.cgi line 56 which is this line "my $tracknums = $tracks->first_child_text('number_of_tracks');
----
a value for the "$tracknums" is returned and all other values are presented after it parses the XML file. Haven't been able to figure out why I keep getting this error??

############################################################################
my $twig = new XML::Twig(twig_handlers => { product => \&get_products });
$twig->parsefile("$xmlfile");
$twig->purge();
############################################################################
sub get_products {
my($t,$elt) = @_;
my($track_title,$trackno,$setno,$soundtype,$codec,$file);

# process each product loop.
 my $article_number = $elt->first_child_text('article_number');
 my $dist_number    = $elt->first_child_text('distributor_number');
 my $dist_name      = $elt->first_child_text('distributor_name');
 my $artist         = $elt->first_child_text('artist');
 my $ean_upc        = $elt->first_child_text('ean_upc');
 my $set_total      = $elt->first_child_text('set_total');

 my $tracks    = $elt->first_child('tracks');
 # LINE 56 here
 my $tracknums = $tracks->first_child_text('number_of_tracks');

 return unless $tracks;

 for my $track ($tracks->children('track'))
  {
    $track_title = $track->first_child_text('title');
    $trackno     = $track->first_child_text('trackno');
    $setno       = $track->first_child_text('setno');

   for my $sound ($track->children('sound'))
    {
      $soundtype =  $sound->first_child_text('sound_type');
      $codec     =  $sound->first_child_text('codec');
      $file      =  $sound->first_child_text('file');~;
    }

  } # close for $track loop
# free up memory
$t->purge();
}

Hello Mike

First of all, your call

 $twig->parsefile("$xmlfile");

should properly be

 $twig->parsefile($xmlfile);

as there is no point in forcing Perl to interpolate a string when the result
is simply the string itself.

Your code works fine on the sample 000001.xml file that you posted. What must
be happening is that there is a product in your live data that has no <tracks>
element. The line

 return unless $tracks;

is meant to protect against this, but you have use the value of $tracks before
the check. If you change your code to:

 my $tracks = $elt->first_child('tracks');
 return unless $tracks;

 my $numtracks = $tracks->first_child_text('number_of_tracks');

 for my $track ($tracks->children('track')) {
   :
 }

then your warning should go away, although you may want to do more than just
ignore any products without a <tracks> tag.

An alternative method, which checks the actual number of tracks instead of
relying on the accuracy of the <number_of_tracks> value is to put the <track>
elements into an array and measure its size before iterating over it:

 my $tracks = $elt->first_child('tracks');
 return unless $tracks;
my @tracks = $tracks->children('track');

 my $numtracks = @tracks;

 for my $track (@tracks) {
   :
 }

Which of these techniques you choose is up to you.

HTH,

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to