On Dec 10 2007, 3:43 am, [EMAIL PROTECTED] (Chas. Owens) wrote: > On Dec 10, 2007 8:24 AM, Tim Bowden <[EMAIL PROTECTED]> wrote: > > > > > On Mon, 2007-12-10 at 13:14 +0000, Beginner wrote: > > > Hi, > > > > I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract > > > an attribute from each record (code=). I several problems one of > > > which is the size of the file is making it painful to test my scripts > > > and methods for parsing. > > > > I would like to extract a few hundred records (by any means) so I can > > > experiment. I think XPath is the way to go here. The file > > > (currently) sits on a *nix system but I was going to do the parsing > > > to on a Win32 workstation rather than steal all the memory on a > > > server. > > If your data file is on a *nix system, use > > head -200 filename > sample_filename to take the first 200 records. > > snip > > Unfortunately that won't work with structured data like XML. You best > bet is to use something like XML::Twig to grab the top level records > and output them to a new file. for instance, say we have an XML file > that looks like this > > <root> > <records set="1"> > <record>foo</record> > <record>bar</record> > <record>baz</record> > </records> > <records set="2"> > <record>quux</record> > </records> > <records set="3"> > <record>foofoo</record> > <record>foobar</record> > </records> > </root> > > and we only want the first two sets of records. We could use this > code to produce a new file with only those records > > #!/usr/bin/perl > > use strict; > use warnings; > > use XML::Twig; > > my $i; > my $t = XML::Twig->new( > twig_handlers => { > records => sub { > exit if ++$i > 2; > $_->print; > $_->flush; > } > } > ); > > print "<root>"; > $t->parsefile("t.xml"); > print "</root>";
BTW, I forgot to mention. I have a huge XML file that has 4,621 records. I am trying to compare with an array of 1,187 specific record identifiers, and just print out those records (as XML). So I will have a new working XML file with 1,187 records. Thanks! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/