Re: Input split for a streaming job!

Milind.Bhandarkar Mon, 14 Nov 2011 14:03:31 -0800

It looks like your hadoop distro does not have
https://issues.apache.org/jira/browse/HADOOP-4012.


- milind

On 11/10/11 2:40 PM, "Raj V" <[email protected]> wrote:

>All
>
>I assumed that the input splits for a streaming job will follow the same
>logic as a map reduce java job but I seem to be wrong.
>
>I started out with 73 gzipped files that vary between 23MB to 255MB in
>size. My default block size was 128MB.  8 of the 73 files are larger than
>128 MB
>
>When I ran my streaming job, it ran, as expected,  73 mappers ( No
>reducers for this job).
>
>Since I have 128 Nodes in my cluster , I thought I would use more systems
>in the cluster by increasing the number of mappers. I changed all the
>gzip files into bzip2 files. I expected the number of mappers to increase
>to 81. The mappers remained at 73.
>
>I tried a second experiment- I changed my dfs.block.size to 32MB. That
>should have increased my mappers to about ~250. It remains steadfast at
>73.
>
>Is my understanding wrong? With a smaller block size and bzipped files,
>should I not get more mappers?
>
>Raj

Re: Input split for a streaming job!

Reply via email to