Re: .gz as input files in streaming.

Miles Osborne Tue, 14 Jul 2009 13:49:57 -0700

here is a part of a shell script i wrote which deals with compressed input
and produces compressed output (for streaming)


>
hadoop dfs -rmr $4
hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
-mapper $1 -reducer $2 -input $3/* -output
 $4 -file $1 -file $2 -jobconf mapred.job.name="$5"   -jobconf
stream.recordreader.compression=gzip \
-jobconf mapred.output.compress=true \
-jobconf
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec


2009/7/14 Dmitry Pushkarev <u...@stanford.edu>

> Dear hadoop users,
>
>
>
> Sorry for probably very common question, but is there a way to process
> folder with .gz files with streaming?
>
> In manual they only describe how to create GZIPped output, but I couldn't
> figure out how to use GZIPped files for input.
>
>
>
> Right now I create a list of these files and process them like "hadoop -cat
> $file |gzip -dc |" but that doesn't use data-locality of archives (each
> file
> is 64MB - exactly one block).
>
>
>
> A sample code or link to manual would be greatly appreciated.
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: .gz as input files in streaming.

Reply via email to