Rob,
----- Original Message ----- From: "Rob Dixon" <[EMAIL PROTECTED]>
To: <beginners@perl.org>
Cc: "Mike Blezien" <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 10:57 AM
Subject: Re: Parsing large XML file - Revisited


Mike Blezien wrote:

Mirod wrote:

On Jul 22, 3:33 am, [EMAIL PROTECTED] (Dr.Ruud) wrote:

"Mike Blezien" schreef:

>   my $article_number = $elt->first_child_text('article_number');
>   my $dist_number    = $elt->first_child_text('distributor_number');
>   my $dist_name      = $elt->first_child_text('distributor_name');
>   my $artist         = $elt->first_child_text('artist');
>   my $ean_upc        = $elt->first_child_text('ean_upc');
>   my $set_total      = $elt->first_child_text('set_total');

That looks awful. Isn't there some way with the module to do it cleaner?

Or do it more like:

  my @text_tags = qw(article_number distributor_number etc);
  my %data;

  for my $tag (@text_tags) {
      $data{_text}{$tag} = $elt->first_child_text($tag);
  }

just a quick note, that first_child_text can also be written field,
which often makes more sense in a data oriented context.

that was an excellent idea :) Alot cleaner and alot less coding involved. Still fairly new working with XML parsing.

Hi Mike

Using a shorter synonym for a method isn't a significant improvement. I prefer
the 'first_child_text' name as it is more descriptive, and if I was using
exactly the code above I would rewrite it as:

my ($article_number, $dist_number, $dist_name, $artist, $ean_upc, $set_total) = map {
   $elt->first_child_text($_)
} qw/article_number distributor_number distributor_name artist ean_upc set_total/

But I made no changes to your code apart from to correct the semantics as it wasn't at all obvious what you're doing. The code you posted just extracts XML field text values into a number of lexical variables and then discards them. If you give us an idea what your final intention is then I'm sure we can help, and it probably won't involve using 'field' instead of 'first_child_text'; but it is likely that a hash
structure would be more appropriate.

As I mentioned in an earlier post, it's important to separate XML nodes from their textual content. XML::Twig methods return both types of data and it's best not to mix them up. More importantly, you can always extract the text data value from an XML node,
but the reverse isn't true.

Obviously there are several approaches to accomplish this task. With the help of yourself and others who posted, I have been able to put together a fairly efficient script, as we need to process & parse approx., 5000+ XML files averaging 9-1000KB's in size. So far it has been working smoothly :)

Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to