Re: Input split for a streaming job!

Raj V Mon, 14 Nov 2011 15:42:45 -0800

MIlind

I realised that thankls to Joey from Cloudera. I have given up on bzip.


Raj



>________________________________
>From: "[email protected]" <[email protected]>
>To: [email protected]; [email protected]; [email protected]
>Sent: Monday, November 14, 2011 2:02 PM
>Subject: Re: Input split for a streaming job!
>
>It looks like your hadoop distro does not have
>https://issues.apache.org/jira/browse/HADOOP-4012.
>
>- milind
>
>On 11/10/11 2:40 PM, "Raj V" <[email protected]> wrote:
>
>>All
>>
>>I assumed that the input splits for a streaming job will follow the same
>>logic as a map reduce java job but I seem to be wrong.
>>
>>I started out with 73 gzipped files that vary between 23MB to 255MB in
>>size. My default block size was 128MB.  8 of the 73 files are larger than
>>128 MB
>>
>>When I ran my streaming job, it ran, as expected,  73 mappers ( No
>>reducers for this job).
>>
>>Since I have 128 Nodes in my cluster , I thought I would use more systems
>>in the cluster by increasing the number of mappers. I changed all the
>>gzip files into bzip2 files. I expected the number of mappers to increase
>>to 81. The mappers remained at 73.
>>
>>I tried a second experiment- I changed my dfs.block.size to 32MB. That
>>should have increased my mappers to about ~250. It remains steadfast at
>>73.
>>
>>Is my understanding wrong? With a smaller block size and bzipped files,
>>should I not get more mappers?
>>
>>Raj
>
>
>
>

Re: Input split for a streaming job!

Reply via email to