Re: Parsing large XML file - Revisited

2007-07-26 Thread Srikanth
On Jul 25, 9:11 pm, [EMAIL PROTECTED] (Mike Blezien) wrote:
 Rob,

 - Original Message -
 From: Rob Dixon [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Mike Blezien [EMAIL PROTECTED]
 Sent: Wednesday, July 25, 2007 10:57 AM
 Subject: Re: Parsing large XML file - Revisited

  Mike Blezien wrote:

  Mirod wrote:

  On Jul 22, 3:33 am, [EMAIL PROTECTED] (Dr.Ruud) wrote:

  Mike Blezien schreef:

 my $article_number = $elt-first_child_text('article_number');
 my $dist_number= $elt-first_child_text('distributor_number');
 my $dist_name  = $elt-first_child_text('distributor_name');
 my $artist = $elt-first_child_text('artist');
 my $ean_upc= $elt-first_child_text('ean_upc');
 my $set_total  = $elt-first_child_text('set_total');

  That looks awful. Isn't there some way with the module to do it cleaner?

  Or do it more like:

my @text_tags = qw(article_number distributor_number etc);
my %data;

for my $tag (@text_tags) {
$data{_text}{$tag} = $elt-first_child_text($tag);
}

  just a quick note, that first_child_text can also be written field,
  which often makes more sense in a data oriented context.

  that was an excellent idea :) Alot cleaner and alot less coding involved.
  Still fairly new working with XML parsing.

  Hi Mike

  Using a shorter synonym for a method isn't a significant improvement. I 
  prefer
  the 'first_child_text' name as it is more descriptive, and if I was using
  exactly the code above I would rewrite it as:

   my ($article_number, $dist_number, $dist_name, $artist, $ean_upc, 
  $set_total)
  = map {
 $elt-first_child_text($_)
   } qw/article_number distributor_number distributor_name artist ean_upc
  set_total/

  But I made no changes to your code apart from to correct the semantics as it
  wasn't
  at all obvious what you're doing. The code you posted just extracts XML 
  field
  text
  values into a number of lexical variables and then discards them. If you 
  give
  us an
  idea what your final intention is then I'm sure we can help, and it probably
  won't
  involve using 'field' instead of 'first_child_text'; but it is likely that a
  hash
  structure would be more appropriate.

  As I mentioned in an earlier post, it's important to separate XML nodes from
  their
  textual content. XML::Twig methods return both types of data and it's best 
  not
  to mix
  them up. More importantly, you can always extract the text data value from 
  an
  XML node,
  but the reverse isn't true.

 Obviously there are several approaches to accomplish this task. With the help 
 of
 yourself and others who posted, I have been able to put together a fairly
 efficient script, as we need to process  parse approx., 5000+ XML files
 averaging 9-1000KB's in size. So far it has been working smoothly :)

 Mike


Hi,
My requirement is to compare two xml(large[50MB] each) files and
generate an xml file with differences(xmldelta). But here the problem
is the modules(XML::Diff) which I installed are taking lot of
memory(even 2 GB RAM is not sufficient) and time. I am thinking that
those modules are using XML::Parser which in turn uses DOM Parser so
that taking lot of memory.
Is there any way in perl which will do that using SAX Parser? or which
will take less memory?
Please help in this regard.

Thanks in advance.

Regards,
L.Srikanth Kumar



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: [PBML] Setting special variable $/ to a regular expression

2004-08-26 Thread Madani, Srikanth, VF-DE
Our fellow list member Denham wrote:
Is there some way to set the special variable $/ to a regular expression
such as (.+?):(\d+):(.*)

Well, man perlvar says:

perlvar Remember: the value of $/ is a string, not a regexp.
perlvar AWK has to be better for something :-)


I am trying to process a logfile, with a time based entry as an entry
identifier i.e.
Wed Mar 31 11:40:45 2004
Thread 1 advanced to log sequence 4636


I'm not sure I understand the parsing requrement correctly, but perhaps you can just 
read each line and search for the timestamp pattern?

Cheers,

Srikanth Madani
A bird in the hand makes it awfully hard to blow your nose.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response