Re: Parsing large XML file - Revisited

Mike Blezien Wed, 25 Jul 2007 09:11:57 -0700

Rob,

----- Original Message -----From: "Rob Dixon" <[EMAIL PROTECTED]>

To: <[email protected]>
Cc: "Mike Blezien" <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 10:57 AM
Subject: Re: Parsing large XML file - Revisited

Mike Blezien wrote:
Mirod wrote:
On Jul 22, 3:33 am, [EMAIL PROTECTED] (Dr.Ruud) wrote:
"Mike Blezien" schreef:

>   my $article_number = $elt->first_child_text('article_number');
>   my $dist_number    = $elt->first_child_text('distributor_number');
>   my $dist_name      = $elt->first_child_text('distributor_name');
>   my $artist         = $elt->first_child_text('artist');
>   my $ean_upc        = $elt->first_child_text('ean_upc');
>   my $set_total      = $elt->first_child_text('set_total');

That looks awful. Isn't there some way with the module to do it cleaner?

Or do it more like:

  my @text_tags = qw(article_number distributor_number etc);
  my %data;

  for my $tag (@text_tags) {
      $data{_text}{$tag} = $elt->first_child_text($tag);
  }
just a quick note, that first_child_text can also be written field,
which often makes more sense in a data oriented context.
that was an excellent idea :) Alot cleaner and alot less coding involved.Still fairly new working with XML parsing.
Hi Mike

Using a shorter synonym for a method isn't a significant improvement. I prefer
the 'first_child_text' name as it is more descriptive, and if I was using
exactly the code above I would rewrite it as:
my ($article_number, $dist_number, $dist_name, $artist, $ean_upc, $set_total)= map {
   $elt->first_child_text($_)
} qw/article_number distributor_number distributor_name artist ean_upcset_total/
But I made no changes to your code apart from to correct the semantics as itwasn'tat all obvious what you're doing. The code you posted just extracts XML fieldtextvalues into a number of lexical variables and then discards them. If you giveus anidea what your final intention is then I'm sure we can help, and it probablywon'tinvolve using 'field' instead of 'first_child_text'; but it is likely that ahash
structure would be more appropriate.
As I mentioned in an earlier post, it's important to separate XML nodes fromtheirtextual content. XML::Twig methods return both types of data and it's best notto mixthem up. More importantly, you can always extract the text data value from anXML node,
but the reverse isn't true.

Obviously there are several approaches to accomplish this task. With the help ofyourself and others who posted, I have been able to put together a fairlyefficient script, as we need to process & parse approx., 5000+ XML filesaveraging 9-1000KB's in size. So far it has been working smoothly :)

Mike

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parsing large XML file - Revisited

Reply via email to