Hi, I made some progress, combination of NLineInputFormat and mapre.max.split.size seems to work, but it is hard to exactly set the byte value. Input lines have from 64 to 1024 bytes approx.

What I need is having as much mappers as possible (use full potential of the cluster), where each receives N input lines.


On 06/17/2012 05:02 AM, Harsh J wrote:
Ondřej,

While NLineInputFormat will indeed give you N lines per task, it does
not guarantee that the N map tasks that come out for a file from it
will all be sent to different nodes. Which one is your need exactly -
Simply having N lines per map task, or N wider distributed maps?

On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<klimp...@fit.cvut.cz>  wrote:
I tried this approach, but the job is not distributed among 10 mapper nodes.
Seems Hadoop ignores this property :(

My first thought is, that the small file size is the problem and Hadoop
doesn't care about it's splitting in proper way.

Thanks any ideas.



On 06/16/2012 11:27 AM, Bejoy KS wrote:
Hi Ondrej

You can use NLineInputFormat with n set to 10.

------Original Message------
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31

Hello,

I have very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?

Thanks for advice.
Ondrej Klimpera


Regards
Bejoy KS

Sent from handheld, please excuse typos.




Reply via email to