subject:"Re\: Increase the number of mappers in PM mode"

Re: Increase the number of mappers in PM mode

2013-03-16 Thread Harsh J

In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).

On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706  wrote:

> hi：
>i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15，18:32，Zheyi RONG  写道：
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang 
> wrote:
>
>> s
>
>
>
>

-- 
Harsh J

Re: Increase the number of mappers in PM mode

2013-03-15 Thread yypvsxf19870706

hi：
   i think i have got it . Thank you.

发自我的 iPhone

在 2013-3-15，18:32，Zheyi RONG  写道：

> Indeed you cannot explicitly set the number of mappers, but still you can 
> gain some control over it, by setting mapred.max.split.size, or 
> mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you 
> can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: 
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang  
> wrote:
>> s
> 
>

Re: Increase the number of mappers in PM mode

2013-03-15 Thread Zheyi RONG

Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang wrote:

> s

Re: Increase the number of mappers in PM mode

2013-03-15 Thread YouPeng Yang

HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224






regards





2013/3/14 YouPeng Yang 

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG 
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang 
>> wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>> 
>>>Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>  Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>> 
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai 
>>>


  In Pseudo Mode where is the setting to increase the number of mappers
 or is this not possible.
 Thanks
 Sai

>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

2013-03-14 Thread YouPeng Yang

Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG 

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang 
> wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>> 
>>Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>  Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>> 
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai 
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

2013-03-14 Thread Zheyi RONG

Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
> 
>Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>  Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
> 
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai 
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

2013-03-14 Thread YouPeng Yang

Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:

   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
 Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76


 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai 

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

2013-03-14 Thread Sai Sai




In Pseudo Mode where is the setting to increase the number of mappers or is 
this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

Re: Increase the number of mappers in PM mode

8 matches

Site Navigation

Mail list logo

Footer information