Assuming that memory won't be an issue, you could use MARC::Batch to read in the record set and print out seperate files where you split on X amount of records. You would have an iterative loop loading each record from the large batch, and a counter variable that would get reset after X amount of records. You might want to name the sets using another counter that keeps track of how many sets you have and name each file something like batch_$count.mrc and write them out to a specific directory. Just concatenate each record to the previous one when you're making your smaller batches.
Rob Fox Hesburgh Libraries University of Notre Dame On Jan 25, 2010, at 9:48 AM, "Nolte, Jennifer" <jennifer.no...@yale.edu> wrote: > Hello- > > I am working with files of MARC records that are over a million > records each. I'd like to split them down into smaller chunks, > preferably using a command line. MARCedit works, but is slow and > made for the desktop. I've looked around and haven't found anything > truly useful- Endeavor's MARCsplit comes close but doesn't separate > files into even numbers, only by matching criteria, so there could > be lots of record duplication between files. > > Any idea where to begin? I am a (super) novice Perl person. > > Thank you! > > ~Jenn Nolte > > > Jenn Nolte > Applications Manager / Database Analyst > Production Systems Team > Information Technology Office > Yale University Library > 130 Wall St. > New Haven CT 06520 > 203 432 4878 > >