ooops, more like:

        tar -t big-file.tar.gz  | parallel tar -O -x -f big-file.tar.gz '|' 
someCommandThatReadsFromStdIn


Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 
 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf 
> Of Cook, Malcolm
> Sent: Tuesday, March 29, 2011 4:35 PM
> To: 'Ole Tange'; 'Jay Hacker'
> Cc: '[email protected]'
> Subject: RE: Processing files from a tar archive in parallel
> 
> Hmmm
> 
> use tar-t to extract the filenames pipe that into parallel to 
> call tar again to extract just that file and pipe it to some 
> other command
> 
> tar -t big-file.tar.gz  | parallel tar -f big-file.tar.gz - 
> '|' someCommandThatReadsFromStdIn
> 
> Malcolm Cook
> Stowers Institute for Medical Research -  Bioinformatics 
> Kansas City, Missouri  USA
>  
>  
> 
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Ole 
> > Tange
> > Sent: Tuesday, March 29, 2011 4:14 PM
> > To: Jay Hacker
> > Cc: [email protected]
> > Subject: Re: Processing files from a tar archive in parallel
> > 
> > On Tue, Mar 29, 2011 at 10:14 PM, Jay Hacker <[email protected]> 
> > wrote:
> > > On Tue, Mar 29, 2011 at 11:20 AM, Hans Schou 
> <[email protected]> wrote:
> > >> On Tue, 29 Mar 2011, Jay Hacker wrote:
> > >>
> > >>> I have a large gzipped tar archive containing many small
> > files; just
> > >>> untarring it takes a lot of time and space.  I'd like to
> > be able to
> > >>> process each file in the archive, ideally without untarring the 
> > >>> whole thing first,
> > :
> > >> tar xvf big-file.tar.gz | parallel echo "Proc this file {}"
> > >>
> > >> Parallel will start when the first file is untared.
> > :
> > > That is a great idea.  However, can I be sure the file is
> > completely
> > > written to disk before tar prints the filename?
> > 
> > While I loved Hans' idea, it does indeed have a race 
> condition. This 
> > should run 'ls -l' on each file after decompressing and 
> clearly fails 
> > now and then:
> > 
> > $ tar xvf ../i.tgz | parallel ls -l > ls-l
> > ls: cannot access 1792: No such file or directory
> > ls: cannot access 209: No such file or directory
> > ls: cannot access 21: No such file or directory
> > ls: cannot access 2256: No such file or directory
> > ls: cannot access 2349: No such file or directory
> > ls: cannot access 2363: No such file or directory
> > ls: cannot access 246: No such file or directory
> > ls: cannot access 2712: No such file or directory
> > 
> > But you could unpack in a new dir and use:
> > http://www.gnu.org/software/parallel/man.html#example__gnu_par
> > allel_as_dir_processor
> > 
> > That seems to work.
> > 
> > /Ole
> > 
> > 
> 

Reply via email to