On Jul 25, 9:11 pm, [EMAIL PROTECTED] (Mike Blezien) wrote:
Rob,
- Original Message -
From: Rob Dixon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Mike Blezien [EMAIL PROTECTED]
Sent: Wednesday, July 25, 2007 10:57 AM
Subject: Re: Parsing large XML file - Revisited
Mike Blezien wrote:
Mirod wrote:
On Jul 22, 3:33 am, [EMAIL PROTECTED] (Dr.Ruud) wrote:
Mike Blezien schreef:
my $article_number = $elt-first_child_text('article_number');
my $dist_number= $elt-first_child_text('distributor_number');
my $dist_name = $elt-first_child_text('distributor_name');
my $artist = $elt-first_child_text('artist');
my $ean_upc= $elt-first_child_text('ean_upc');
my $set_total = $elt-first_child_text('set_total');
That looks awful. Isn't there some way with the module to do it cleaner?
Or do it more like:
my @text_tags = qw(article_number distributor_number etc);
my %data;
for my $tag (@text_tags) {
$data{_text}{$tag} = $elt-first_child_text($tag);
}
just a quick note, that first_child_text can also be written field,
which often makes more sense in a data oriented context.
that was an excellent idea :) Alot cleaner and alot less coding involved.
Still fairly new working with XML parsing.
Hi Mike
Using a shorter synonym for a method isn't a significant improvement. I
prefer
the 'first_child_text' name as it is more descriptive, and if I was using
exactly the code above I would rewrite it as:
my ($article_number, $dist_number, $dist_name, $artist, $ean_upc,
$set_total)
= map {
$elt-first_child_text($_)
} qw/article_number distributor_number distributor_name artist ean_upc
set_total/
But I made no changes to your code apart from to correct the semantics as it
wasn't
at all obvious what you're doing. The code you posted just extracts XML
field
text
values into a number of lexical variables and then discards them. If you
give
us an
idea what your final intention is then I'm sure we can help, and it probably
won't
involve using 'field' instead of 'first_child_text'; but it is likely that a
hash
structure would be more appropriate.
As I mentioned in an earlier post, it's important to separate XML nodes from
their
textual content. XML::Twig methods return both types of data and it's best
not
to mix
them up. More importantly, you can always extract the text data value from
an
XML node,
but the reverse isn't true.
Obviously there are several approaches to accomplish this task. With the help
of
yourself and others who posted, I have been able to put together a fairly
efficient script, as we need to process parse approx., 5000+ XML files
averaging 9-1000KB's in size. So far it has been working smoothly :)
Mike
Hi,
My requirement is to compare two xml(large[50MB] each) files and
generate an xml file with differences(xmldelta). But here the problem
is the modules(XML::Diff) which I installed are taking lot of
memory(even 2 GB RAM is not sufficient) and time. I am thinking that
those modules are using XML::Parser which in turn uses DOM Parser so
that taking lot of memory.
Is there any way in perl which will do that using SAX Parser? or which
will take less memory?
Please help in this regard.
Thanks in advance.
Regards,
L.Srikanth Kumar
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/