Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Samuel Guo
On Tue, Oct 7, 2008 at 1:12 PM, Andy Li <[EMAIL PROTECTED]> wrote:

> I think there should be some documents where it indicates that the
> "setNumMapTaks" and "setNumReduceTaks"
> will be override by the splitting.  It was misleading at the first time
> when
> I use them.  I expect that the files will be
> split into blocks/chunks and each block/chunk will be assign to a Mapper.
> The maximum Mapper count will be controlled
> by the number specified in "setNumMapTaks" and "setNumReduceTaks".
>

check the document:
http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)
http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobConf.html#setNumReduceTasks(int)


>
> Unfortunately, this was not the exact expectation base on the method name.
> =(
> Anyone know if this is the correct answer to this problem? or they are
> actually 2 different things?
>
> Thanks,
> -Andy
>


Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Andy Li
Thanks Samuel.  I have tried to look for answers and some try-and-error in
my own program.

The only way I know so far to enforce the Mapper is to assign each file to
one Mapper by your own customized InputFormat and RecordReader and override
isSplittable() to always return false.

In another mail threat, some one posted it.  I found this link:
http://www.nabble.com/1-file-per-record-td19644985.html
which shows you how to prevent the file from splitting.
In Hadoop Wiki FAQ 10, it also shows you how to assign one file to one
Mapper in several ways.

I think there should be some documents where it indicates that the
"setNumMapTaks" and "setNumReduceTaks"
will be override by the splitting.  It was misleading at the first time when
I use them.  I expect that the files will be
split into blocks/chunks and each block/chunk will be assign to a Mapper.
The maximum Mapper count will be controlled
by the number specified in "setNumMapTaks" and "setNumReduceTaks".

Unfortunately, this was not the exact expectation base on the method name.
=(
Anyone know if this is the correct answer to this problem? or they are
actually 2 different things?

Thanks,
-Andy

On Mon, Oct 6, 2008 at 7:02 PM, Samuel Guo <[EMAIL PROTECTED]> wrote:

> Mapper's Number depends on your inputformat.
> Default Inputformat try to treat every file block of a file as a
> InputSplit.
> And you will get the same number of mappers as the number of your
> inputsplits.
> try to configure "mapred.min.split.size" to reduce the number of your
> mapper
> if you want to.
>
> And I don't know why your reducer is just one. Anyone knows?
>
> On Tue, Oct 7, 2008 at 9:06 AM, Andy Li <[EMAIL PROTECTED]> wrote:
>
> > Dears,
> >
> > Sorry, I did not mean to cross post.  But the previous article was
> > accidentally posted to the HBase user list.  I would like to bring it
> back
> > to the Hadoop user since it is confusing me a lot and it is mainly
> > MapReduce
> > related.
> >
> > Currently running version hadoop-0.18.1 on 25 nodes.  Map and Reduce Task
> > Capacity is 92.  When I do this in my MapReduce program:
> >
> > = SAMPLE CODE =
> >JobConf jconf = new JobConf(conf, TestTask.class);
> >jconf.setJobName("my.test.TestTask");
> >jconf.setOutputKeyClass(Text.class);
> >jconf.setOutputValueClass(Text.class);
> >jconf.setOutputFormat(TextOutputFormat.class);
> >jconf.setMapperClass(MyMapper.class);
> >jconf.setCombinerClass(MyReducer.class);
> >jconf.setReducerClass(MyReducer.class);
> >jconf.setInputFormat(TextInputFormat.class);
> >try {
> >jconf.setNumMapTasks(5);
> >jconf.setNumReduceTasks(3);
> >JobClient.runJob(jconf);
> >} catch (Exception e) {
> >e.printStackTrace();
> >}
> > = = =
> >
> > When I run the job, I'm always getting 300 mappers and 1 reducers from
> the
> > JobTracker webpage running on the default port 50030.
> > No matter how I configure the numbers in methods "setNumMapTasks" and
> > "setNumReduceTasks", I get the same result.
> > Does anyone know why this is happening?
> > Am I missing something or misunderstand something in the picture?  =(
> >
> > Here's a reference to the parameters we have override in
> "hadoop-site.xml".
> > ===
> > 
> >  mapred.tasktracker.map.tasks.maximum
> >  4
> > 
> >
> > 
> >  mapred.tasktracker.reduce.tasks.maximum
> >  4
> > 
> > 
> > other parameters are default from hadoop-default.xml.
> >
> > Any idea how this is happening?
> >
> > Any inputs are appreciated.
> >
> > Thanks,
> > -Andy
> >
>


Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Samuel Guo
Mapper's Number depends on your inputformat.
Default Inputformat try to treat every file block of a file as a InputSplit.
And you will get the same number of mappers as the number of your
inputsplits.
try to configure "mapred.min.split.size" to reduce the number of your mapper
if you want to.

And I don't know why your reducer is just one. Anyone knows?

On Tue, Oct 7, 2008 at 9:06 AM, Andy Li <[EMAIL PROTECTED]> wrote:

> Dears,
>
> Sorry, I did not mean to cross post.  But the previous article was
> accidentally posted to the HBase user list.  I would like to bring it back
> to the Hadoop user since it is confusing me a lot and it is mainly
> MapReduce
> related.
>
> Currently running version hadoop-0.18.1 on 25 nodes.  Map and Reduce Task
> Capacity is 92.  When I do this in my MapReduce program:
>
> = SAMPLE CODE =
>JobConf jconf = new JobConf(conf, TestTask.class);
>jconf.setJobName("my.test.TestTask");
>jconf.setOutputKeyClass(Text.class);
>jconf.setOutputValueClass(Text.class);
>jconf.setOutputFormat(TextOutputFormat.class);
>jconf.setMapperClass(MyMapper.class);
>jconf.setCombinerClass(MyReducer.class);
>jconf.setReducerClass(MyReducer.class);
>jconf.setInputFormat(TextInputFormat.class);
>try {
>jconf.setNumMapTasks(5);
>jconf.setNumReduceTasks(3);
>JobClient.runJob(jconf);
>} catch (Exception e) {
>e.printStackTrace();
>}
> = = =
>
> When I run the job, I'm always getting 300 mappers and 1 reducers from the
> JobTracker webpage running on the default port 50030.
> No matter how I configure the numbers in methods "setNumMapTasks" and
> "setNumReduceTasks", I get the same result.
> Does anyone know why this is happening?
> Am I missing something or misunderstand something in the picture?  =(
>
> Here's a reference to the parameters we have override in "hadoop-site.xml".
> ===
> 
>  mapred.tasktracker.map.tasks.maximum
>  4
> 
>
> 
>  mapred.tasktracker.reduce.tasks.maximum
>  4
> 
> 
> other parameters are default from hadoop-default.xml.
>
> Any idea how this is happening?
>
> Any inputs are appreciated.
>
> Thanks,
> -Andy
>


Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Andy Li
Dears,

Sorry, I did not mean to cross post.  But the previous article was
accidentally posted to the HBase user list.  I would like to bring it back
to the Hadoop user since it is confusing me a lot and it is mainly MapReduce
related.

Currently running version hadoop-0.18.1 on 25 nodes.  Map and Reduce Task
Capacity is 92.  When I do this in my MapReduce program:

= SAMPLE CODE =
JobConf jconf = new JobConf(conf, TestTask.class);
jconf.setJobName("my.test.TestTask");
jconf.setOutputKeyClass(Text.class);
jconf.setOutputValueClass(Text.class);
jconf.setOutputFormat(TextOutputFormat.class);
jconf.setMapperClass(MyMapper.class);
jconf.setCombinerClass(MyReducer.class);
jconf.setReducerClass(MyReducer.class);
jconf.setInputFormat(TextInputFormat.class);
try {
jconf.setNumMapTasks(5);
jconf.setNumReduceTasks(3);
JobClient.runJob(jconf);
} catch (Exception e) {
e.printStackTrace();
}
= = =

When I run the job, I'm always getting 300 mappers and 1 reducers from the
JobTracker webpage running on the default port 50030.
No matter how I configure the numbers in methods "setNumMapTasks" and
"setNumReduceTasks", I get the same result.
Does anyone know why this is happening?
Am I missing something or misunderstand something in the picture?  =(

Here's a reference to the parameters we have override in "hadoop-site.xml".
===

  mapred.tasktracker.map.tasks.maximum
  4



  mapred.tasktracker.reduce.tasks.maximum
  4


other parameters are default from hadoop-default.xml.

Any idea how this is happening?

Any inputs are appreciated.

Thanks,
-Andy