Re: Setting number of mappers according to number of TextInput lines

Ondřej Klimpera Sun, 17 Jun 2012 00:36:14 -0700

Hi, I made some progress, combination of NLineInputFormat andmapre.max.split.size seems to work, but it is hard to exactly set thebyte value. Input lines have from 64 to 1024 bytes approx.

What I need is having as much mappers as possible (use full potentialof the cluster), where each receives N input lines.



On 06/17/2012 05:02 AM, Harsh J wrote:

Ondřej,

While NLineInputFormat will indeed give you N lines per task, it does
not guarantee that the N map tasks that come out for a file from it
will all be sent to different nodes. Which one is your need exactly -
Simply having N lines per map task, or N wider distributed maps?

On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<klimp...@fit.cvut.cz>  wrote:

I tried this approach, but the job is not distributed among 10 mapper nodes.
Seems Hadoop ignores this property :(

My first thought is, that the small file size is the problem and Hadoop
doesn't care about it's splitting in proper way.

Thanks any ideas.



On 06/16/2012 11:27 AM, Bejoy KS wrote:

Hi Ondrej

You can use NLineInputFormat with n set to 10.

------Original Message------
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31

Hello,

I have very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?

Thanks for advice.
Ondrej Klimpera

Regards
Bejoy KS

Sent from handheld, please excuse typos.

Re: Setting number of mappers according to number of TextInput lines

Reply via email to