Re: Input split for a streaming job!

Anirudh Jhina Fri, 11 Nov 2011 00:37:45 -0800

Raj,

What InputFormat are you using? The compressed format is not splittable, so
if you have 73 gzip files, there will be 73 corresponding mappers for each
file respectively. Look at the TextInputFormat.isSplittable() description.


Thanks,
~Anirudh

On Thu, Nov 10, 2011 at 2:40 PM, Raj V <[email protected]> wrote:

> All
>
> I assumed that the input splits for a streaming job will follow the same
> logic as a map reduce java job but I seem to be wrong.
>
> I started out with 73 gzipped files that vary between 23MB to 255MB in
> size. My default block size was 128MB.  8 of the 73 files are larger than
> 128 MB
>
> When I ran my streaming job, it ran, as expected,  73 mappers ( No
> reducers for this job).
>
> Since I have 128 Nodes in my cluster , I thought I would use more systems
> in the cluster by increasing the number of mappers. I changed all the gzip
> files into bzip2 files. I expected the number of mappers to increase to 81.
> The mappers remained at 73.
>
> I tried a second experiment- I changed my dfs.block.size to 32MB. That
> should have increased my mappers to about ~250. It remains steadfast at 73.
>
> Is my understanding wrong? With a smaller block size and bzipped files,
> should I not get more mappers?
>
> Raj

Re: Input split for a streaming job!

Reply via email to