I also recommend using MARC::Batch. Attached is a simple script I wrote for myself.
Saiful Amin +91-9343826438 On Mon, Jan 25, 2010 at 8:33 PM, Robert Fox <rf...@nd.edu> wrote: > Assuming that memory won't be an issue, you could use MARC::Batch to > read in the record set and print out seperate files where you split on > X amount of records. You would have an iterative loop loading each > record from the large batch, and a counter variable that would get > reset after X amount of records. You might want to name the sets using > another counter that keeps track of how many sets you have and name > each file something like batch_$count.mrc and write them out to a > specific directory. Just concatenate each record to the previous one > when you're making your smaller batches. > > Rob Fox > Hesburgh Libraries > University of Notre Dame > > On Jan 25, 2010, at 9:48 AM, "Nolte, Jennifer" > <jennifer.no...@yale.edu> wrote: > > > Hello- > > > > I am working with files of MARC records that are over a million > > records each. I'd like to split them down into smaller chunks, > > preferably using a command line. MARCedit works, but is slow and > > made for the desktop. I've looked around and haven't found anything > > truly useful- Endeavor's MARCsplit comes close but doesn't separate > > files into even numbers, only by matching criteria, so there could > > be lots of record duplication between files. > > > > Any idea where to begin? I am a (super) novice Perl person. > > > > Thank you! > > > > ~Jenn Nolte > > > > > > Jenn Nolte > > Applications Manager / Database Analyst > > Production Systems Team > > Information Technology Office > > Yale University Library > > 130 Wall St. > > New Haven CT 06520 > > 203 432 4878 > > > > >
#!c:/perl/bin/perl.exe # # Name: mbreaker.pl # Version: 0.1 # Date: Jan 2009 # Author: Saiful Amin <sai...@edutech.com> # # Description: Extract MARC records based on command-line paramenters use strict; use warnings; use Getopt::Long; use MARC::Batch; my $start = 0; my $end = 1; GetOptions ("start=i" => \$start, "end=i" => \$end ); my $batch = MARC::Batch->new('USMARC', $ARGV[0]); $batch->strict_off(); $batch->warnings_off(); my $num = 0; while (my $record = $batch->next() ) { $num++; next if $num < $start; last if $num > $end; print $record->as_usmarc(); warn "$num records\n" if ( $num % 1000 == 0 ); } __END__ =head1 NAME mbreaker.pl Breaks the MARC record file as per start and end position specified =head1 SYNOPSIS mbreaker.pl [options] file Options: -start start position for reading records -end end position for reading records