It looks like your hadoop distro does not have https://issues.apache.org/jira/browse/HADOOP-4012.
- milind On 11/10/11 2:40 PM, "Raj V" <[email protected]> wrote: >All > >I assumed that the input splits for a streaming job will follow the same >logic as a map reduce java job but I seem to be wrong. > >I started out with 73 gzipped files that vary between 23MB to 255MB in >size. My default block size was 128MB. 8 of the 73 files are larger than >128 MB > >When I ran my streaming job, it ran, as expected, 73 mappers ( No >reducers for this job). > >Since I have 128 Nodes in my cluster , I thought I would use more systems >in the cluster by increasing the number of mappers. I changed all the >gzip files into bzip2 files. I expected the number of mappers to increase >to 81. The mappers remained at 73. > >I tried a second experiment- I changed my dfs.block.size to 32MB. That >should have increased my mappers to about ~250. It remains steadfast at >73. > >Is my understanding wrong? With a smaller block size and bzipped files, >should I not get more mappers? > >Raj
