Sunil You could use identity mappers, a single identity reducer and by not having output compression.,
Raj >________________________________ > From: Sunil S Nandihalli <sunil.nandiha...@gmail.com> >To: common-user@hadoop.apache.org >Sent: Tuesday, April 24, 2012 7:01 AM >Subject: Re: hadoop streaming and a directory containing large number of .tgz >files > >Sorry for reforwarding this email. I was not sure if it actually got >through since I just got the confirmation regarding my membership to the >mailing list. >Thanks, >Sunil. > >On Tue, Apr 24, 2012 at 7:12 PM, Sunil S Nandihalli < >sunil.nandiha...@gmail.com> wrote: > >> Hi Everybody, >> I am a newbie to hadoop. I have about 40K .tgz files each of >> approximately 3MB . I would like to process this as if it were a single >> large file formed by >> "cat list-of-files | gnuparallel 'tar -Oxvf {} | sed 1d' > output.txt" >> how can I achieve this using hadoop-streaming or some-other similar >> library.. >> >> >> thanks, >> Sunil. >> > > >