ETL programs like Ab Initio know how to tell parallel processes to split up big 
files and process each part separately, even when the files are linefeed 
delimited (they all agree to search up (or down) for the dividing linefeed 
closest to N bytes down file).  Does anyone know of a utility that can split a 
file this way (without reading it sequentially)?  Is this in gnu parallel?  

It'd be nice to be able to take a list of mixed size files and divide them by 
size into N chunks of approximately equal lines, estimated using byte sizes and 
with an algorythm for searching for the record delimiter (linefeed) such that 
no records are lost.  Sort of a mixed input leveller for parallel loads.  If it 
is part of parallel, then parallel can launch processing for each chunk and to 
combine the chunks.


Reply via email to