Re: Pseudo -distributed mode

2014-08-13 Thread Sergey Murylev
Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:

 1. local mode - when you install hadoop you have empty configs, you
have no daemon processes. In this case your file system would be
used instead of HDFS, map-reduce jobs would be processed in he same
process as hadoop client.
 2. pseudo distributed mode
- you
have simple configuration with namenode, secondarynamenode,
datanode, jobtracker and tasktracker daemons. In this case
map-reduce jobs would be processed in separate child processes of
tasktracker.
 3. distributed mode
 - you have
multiple computers with complicated daemon distribution. In general
case you map-reduce program can run on every node in cluster.

> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce).  On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki  you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.

--
Thanks,
Sergey

On 12/08/14 18:36, sindhu hosamane wrote:

> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> mailto:sergeymury...@gmail.com>> wrote:
>
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>



signature.asc
Description: OpenPGP digital signature


Re: Pseudo -distributed mode

2014-08-12 Thread sindhu hosamane
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .

But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?


On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev 
wrote:

> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>


Re: Pseudo -distributed mode

2014-08-12 Thread Sergey Murylev
Yes :)

Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.

On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine  be considered as
> pseudo-distributed mode hadoop  ?
>
> Thanks,
> Sindhu




signature.asc
Description: OpenPGP digital signature


Re: pseudo distributed mode

2013-05-03 Thread Roman Shaposhnik
On Fri, May 3, 2013 at 12:15 AM, mouna laroussi
 wrote:
> Hi,
>
> I want to configure my Hadoop in tne pseudo distributed mode.
> when i arrive to the step to format namenode, i foind at the web page 50070
> "there are no namenode in the cluster.
> what shouled i do?
> is there any path to change?

Now sure which version of Hadoop you're interested in, but
if it is Hadoop 2 and you're planning to run it on a Linux
system I'd recommend using Bigtop's package hadoop-conf-pseudo
It is specifically designed to get you up and running with Hadoop
pseudo distributed cluster in a matter of minutes.

If this sounds appealing more details are available over here:

https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0

Thanks,
Roman.


Re: pseudo distributed mode

2013-05-03 Thread Mohammad Tariq
After formatting the NN, start the daemons using "bin/start-hdfs.sh" and
"bin/start-mapred.sh". If it still doesn't work show us the logs.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, May 3, 2013 at 10:29 PM, Nitin Pawar wrote:

> once you format the namenode, it will need to started again for the normal
> purpose usage
>
>
> On Fri, May 3, 2013 at 12:45 PM, mouna laroussi 
> wrote:
>
>> Hi,
>>
>> I want to configure my Hadoop in tne pseudo distributed mode.
>> when i arrive to the step to format namenode, i foind at the web page
>> 50070 "there are no namenode in the cluster.
>> what shouled i do?
>> is there any path to change?
>>
>> Thanks
>>
>> --
>> LAROUSSI Mouna
>> Élève ingénieur en Génie Logiciel - INSAT
>> N°: 21 227 880
>>
>
>
>
> --
> Nitin Pawar
>


Re: pseudo distributed mode

2013-05-03 Thread Nitin Pawar
once you format the namenode, it will need to started again for the normal
purpose usage


On Fri, May 3, 2013 at 12:45 PM, mouna laroussi wrote:

> Hi,
>
> I want to configure my Hadoop in tne pseudo distributed mode.
> when i arrive to the step to format namenode, i foind at the web page
> 50070 "there are no namenode in the cluster.
> what shouled i do?
> is there any path to change?
>
> Thanks
>
> --
> LAROUSSI Mouna
> Élève ingénieur en Génie Logiciel - INSAT
> N°: 21 227 880
>



-- 
Nitin Pawar


Re: Pseudo distributed mode : How to increase no of concurrent map task

2012-09-29 Thread Harsh J
Jay,

The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.

With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
..configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.

On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas  wrote:
> Hmmm...  I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?



-- 
Harsh J


Re: Pseudo distributed mode : How to increase no of concurrent map task

2012-09-29 Thread Jay Vyas
Hmmm...  I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.

How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?


Re: Pseudo distributed mode : How to increase no of concurrent map task

2012-09-29 Thread Shing Hing Man
I did restart TaskTracker after  setting 
mapred.tasktracker.map.tasks.maximum.


But I have been  using 
Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).

When I set mapred.tasktracker.map.tasks.maximum to 6 in 

mapred-site.xml, I see 6 concurrent map tasks running. 


That  solves my problem !


Thanks! 


Shing 

 


- Original Message -
From: Harsh J 
To: user@hadoop.apache.org; Shing Hing Man 
Cc: 
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man  wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon 
> processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two 
> concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>



-- 
Harsh J



Re: Pseudo distributed mode : How to increase no of concurrent map task

2012-09-29 Thread Harsh J
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man  wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon 
> processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two 
> concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>



-- 
Harsh J