Assuming that memory won't be an issue, you could use MARC::Batch to  
read in the record set and print out seperate files where you split on  
X amount of records. You would have an iterative loop loading each  
record from the large batch, and a counter variable that would get  
reset after X amount of records. You might want to name the sets using  
another counter that keeps track of how many sets you have and name  
each file something like batch_$count.mrc and write them out to a  
specific directory. Just concatenate each record to the previous one  
when you're making your smaller batches.

Rob Fox
Hesburgh Libraries
University of Notre Dame

On Jan 25, 2010, at 9:48 AM, "Nolte, Jennifer"  
<jennifer.no...@yale.edu> wrote:

> Hello-
>
> I am working with files of MARC records that are over a million  
> records each. I'd like to split them down into smaller chunks,  
> preferably using a command line. MARCedit works, but is slow and  
> made for the desktop. I've looked around and haven't found anything  
> truly useful- Endeavor's MARCsplit comes close but doesn't separate  
> files into even numbers, only by matching criteria, so there could  
> be lots of record duplication between files.
>
> Any idea where to begin? I am a (super) novice Perl person.
>
> Thank you!
>
> ~Jenn Nolte
>
>
> Jenn Nolte
> Applications Manager / Database Analyst
> Production Systems Team
> Information Technology Office
> Yale University Library
> 130 Wall St.
> New Haven CT 06520
> 203 432 4878
>
>

Reply via email to