On Dec 10 2007, 3:43 am, [EMAIL PROTECTED] (Chas. Owens) wrote:
> On Dec 10, 2007 8:24 AM, Tim Bowden <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Mon, 2007-12-10 at 13:14 +0000, Beginner wrote:
> > > Hi,
>
> > > I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract
> > > an attribute from each record (code=). I several problems one of
> > > which is the size of the file is making it painful to test my scripts
> > > and methods for parsing.
>
> > > I would like to extract a few hundred records (by any means) so I can
> > > experiment.  I think XPath is the way to go here. The file
> > > (currently) sits on a *nix system but I was going to do the parsing
> > > to on a Win32 workstation rather than steal all the memory on a
> > > server.
> > If your data file is on a *nix system, use
> > head -200 filename > sample_filename to take the first 200 records.
>
> snip
>
> Unfortunately that won't work with structured data like XML.  You best
> bet is to use something like XML::Twig to grab the top level records
> and output them to a new file.  for instance, say we have an XML file
> that looks like this
>
> <root>
>         <records set="1">
>                 <record>foo</record>
>                 <record>bar</record>
>                 <record>baz</record>
>         </records>
>         <records set="2">
>                 <record>quux</record>
>         </records>
>         <records set="3">
>                 <record>foofoo</record>
>                 <record>foobar</record>
>         </records>
> </root>
>
> and we only want the first two sets of records.  We could use this
> code to produce a new file with only those records
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use XML::Twig;
>
> my $i;
> my $t = XML::Twig->new(
>         twig_handlers => {
>                 records => sub {
>                         exit if ++$i > 2;
>                         $_->print;
>                         $_->flush;
>                 }
>         }
> );
>
> print "<root>";
> $t->parsefile("t.xml");
> print "</root>";


BTW, I forgot to mention.  I have a huge XML file that has 4,621
records.  I am trying to compare with an array of 1,187 specific
record identifiers, and just print out those records (as XML).  So I
will have a new working XML file with 1,187 records.

Thanks!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to