Re: dynamically adding slaves to hadoop cluster

2008-03-10 Thread Owen O'Malley


On Mar 10, 2008, at 8:22 AM, Jason Venner wrote:

Is there a /proper/ way to bring up the processes on the slave node  
so that the master will recognize them at *stop* time?


yes, you can setup the pid files by using (directly on the newly  
added node!):


% bin/hadoop-daemon.sh start datanode
% bin/hadoop-daemon.sh start tasktracker

then the stop-all will know the pid to shut down. It is unfortunate  
that start-daemon.sh and start-daemons.sh differ only in the "s".  
start-daemons.sh should probably be start-slave-daemons.sh or something.


-- Owen


Re: dynamically adding slaves to hadoop cluster

2008-03-10 Thread Jason Venner
We have done this, and it works well. The one downside, is that the 
stop-dfs.sh and stop-mapred.sh (and of course stop-all.sh) doen't seem 
to control the hand started datanodes/job trackers. I am assuming it is 
because the pid files haven't been written to the pid directory but have 
not investigated.


Is there a /proper/ way to bring up the processes on the slave node so 
that the master will recognize them at *stop* time?


tjohn wrote:


Mafish Liu wrote:
  

On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <[EMAIL PROTECTED]> wrote:



You should do the following steps:
1. Have hadoop deployed on the new node with the same directory structure
and configuration.
2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.
  

Addition: do not run "bin/hadoop namenode -format" before you run
datanode,
or you will get a error like "Incompatible namespaceIDs ..."



Datanode and jobtracker will contact to namenode specified in hadoop
configuration file automatically and finish adding new node to the hadoop
cluster.


On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]>
wrote:

  

Yes. You should have the same hadoop-site across all your slaves. They
will need to know the DNS name for the namenode and jobtracker.

- Aaron

tjohn wrote:


Mahadev Konar wrote:

  

I believe (as far as I remember) you should be able to add the node


by


bringing up the datanode or tasktracker on the remote machine. The
Namenode or the jobtracker (I think) does not check for the nodes in


the


slaves file. The slaves file is just to start up all the daemon's by
ssshing to all the nodes in the slaves file during startup. So you
should just be able to startup the datanode pointing to correct


namenode


and it should work.

Regards
Mahadev





Sorry for my ignorance... To make a datanode/tasktraker point to the
namenode what should i do? Have i to edit the hadoop-site.xml? Thanks

John


  


--
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

  


--
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.





Thanks a lot guys! It worked fine and it was exactly what i was looking for.
Best wishes, 
John.


  

--
Jason Venner
Attributor - Publish with Confidence 
Attributor is hiring Hadoop Wranglers, contact if interested


Re: dynamically adding slaves to hadoop cluster

2008-03-10 Thread tjohn



Mafish Liu wrote:
> 
> On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <[EMAIL PROTECTED]> wrote:
> 
>> You should do the following steps:
>> 1. Have hadoop deployed on the new node with the same directory structure
>> and configuration.
>> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.
> 
> Addition: do not run "bin/hadoop namenode -format" before you run
> datanode,
> or you will get a error like "Incompatible namespaceIDs ..."
> 
>>
>>
>> Datanode and jobtracker will contact to namenode specified in hadoop
>> configuration file automatically and finish adding new node to the hadoop
>> cluster.
>>
>>
>> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Yes. You should have the same hadoop-site across all your slaves. They
>> > will need to know the DNS name for the namenode and jobtracker.
>> >
>> > - Aaron
>> >
>> > tjohn wrote:
>> > >
>> > > Mahadev Konar wrote:
>> > >
>> > >> I believe (as far as I remember) you should be able to add the node
>> > by
>> > >> bringing up the datanode or tasktracker on the remote machine. The
>> > >> Namenode or the jobtracker (I think) does not check for the nodes in
>> > the
>> > >> slaves file. The slaves file is just to start up all the daemon's by
>> > >> ssshing to all the nodes in the slaves file during startup. So you
>> > >> should just be able to startup the datanode pointing to correct
>> > namenode
>> > >> and it should work.
>> > >>
>> > >> Regards
>> > >> Mahadev
>> > >>
>> > >>
>> > >>
>> > >
>> > > Sorry for my ignorance... To make a datanode/tasktraker point to the
>> > > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
>> > >
>> > > John
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> [EMAIL PROTECTED]
>> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>>
> 
> 
> 
> -- 
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
> 
> 

Thanks a lot guys! It worked fine and it was exactly what i was looking for.
Best wishes, 
John.

-- 
View this message in context: 
http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15950796.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Mafish Liu
On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <[EMAIL PROTECTED]> wrote:

> You should do the following steps:
> 1. Have hadoop deployed on the new node with the same directory structure
> and configuration.
> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Addition: do not run "bin/hadoop namenode -format" before you run datanode,
or you will get a error like "Incompatible namespaceIDs ..."

>
>
> Datanode and jobtracker will contact to namenode specified in hadoop
> configuration file automatically and finish adding new node to the hadoop
> cluster.
>
>
> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
>
> > Yes. You should have the same hadoop-site across all your slaves. They
> > will need to know the DNS name for the namenode and jobtracker.
> >
> > - Aaron
> >
> > tjohn wrote:
> > >
> > > Mahadev Konar wrote:
> > >
> > >> I believe (as far as I remember) you should be able to add the node
> > by
> > >> bringing up the datanode or tasktracker on the remote machine. The
> > >> Namenode or the jobtracker (I think) does not check for the nodes in
> > the
> > >> slaves file. The slaves file is just to start up all the daemon's by
> > >> ssshing to all the nodes in the slaves file during startup. So you
> > >> should just be able to startup the datanode pointing to correct
> > namenode
> > >> and it should work.
> > >>
> > >> Regards
> > >> Mahadev
> > >>
> > >>
> > >>
> > >
> > > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> > >
> > > John
> > >
> > >
> >
>
>
>
> --
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Mafish Liu
You should do the following steps:
1. Have hadoop deployed on the new node with the same directory structure
and configuration.
2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Datanode and jobtracker will contact to namenode specified in hadoop
configuration file automatically and finish adding new node to the hadoop
cluster.

On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]> wrote:

> Yes. You should have the same hadoop-site across all your slaves. They
> will need to know the DNS name for the namenode and jobtracker.
>
> - Aaron
>
> tjohn wrote:
> >
> > Mahadev Konar wrote:
> >
> >> I believe (as far as I remember) you should be able to add the node by
> >> bringing up the datanode or tasktracker on the remote machine. The
> >> Namenode or the jobtracker (I think) does not check for the nodes in
> the
> >> slaves file. The slaves file is just to start up all the daemon's by
> >> ssshing to all the nodes in the slaves file during startup. So you
> >> should just be able to startup the datanode pointing to correct
> namenode
> >> and it should work.
> >>
> >> Regards
> >> Mahadev
> >>
> >>
> >>
> >
> > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> >
> > John
> >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Aaron Kimball
Yes. You should have the same hadoop-site across all your slaves. They 
will need to know the DNS name for the namenode and jobtracker.


- Aaron

tjohn wrote:


Mahadev Konar wrote:
  

I believe (as far as I remember) you should be able to add the node by
bringing up the datanode or tasktracker on the remote machine. The
Namenode or the jobtracker (I think) does not check for the nodes in the
slaves file. The slaves file is just to start up all the daemon's by
ssshing to all the nodes in the slaves file during startup. So you
should just be able to startup the datanode pointing to correct namenode
and it should work.

Regards
Mahadev





Sorry for my ignorance... To make a datanode/tasktraker point to the
namenode what should i do? Have i to edit the hadoop-site.xml? Thanks

John

  


RE: dynamically adding slaves to hadoop cluster

2008-03-09 Thread tjohn



Mahadev Konar wrote:
> 
> I believe (as far as I remember) you should be able to add the node by
> bringing up the datanode or tasktracker on the remote machine. The
> Namenode or the jobtracker (I think) does not check for the nodes in the
> slaves file. The slaves file is just to start up all the daemon's by
> ssshing to all the nodes in the slaves file during startup. So you
> should just be able to startup the datanode pointing to correct namenode
> and it should work.
> 
> Regards
> Mahadev
> 
> 

Sorry for my ignorance... To make a datanode/tasktraker point to the
namenode what should i do? Have i to edit the hadoop-site.xml? Thanks

John

-- 
View this message in context: 
http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15946094.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Mahadev Konar
I believe (as far as I remember) you should be able to add the node by
bringing up the datanode or tasktracker on the remote machine. The
Namenode or the jobtracker (I think) does not check for the nodes in the
slaves file. The slaves file is just to start up all the daemon's by
ssshing to all the nodes in the slaves file during startup. So you
should just be able to startup the datanode pointing to correct namenode
and it should work.

Regards
Mahadev

> -Original Message-
> From: tjohn [mailto:[EMAIL PROTECTED]
> Sent: Sunday, March 09, 2008 1:18 PM
> To: core-user@hadoop.apache.org
> Subject: Re: dynamically adding slaves to hadoop cluster
> 
> 
> 
> 
> Owen O'Malley-2 wrote:
> >
> >
> > On Mar 9, 2008, at 9:54 AM, tjohn wrote:
> >
> >>
> >> Hi all, i m new to hadoop and i wanted to know how to dynamically
> >> add a slave
> >> to my cluster, obviously while it' s running.
> >
> > If you start a new data node (and/or task tracker), they will join
> > the cluster of the configured name node / job tracker. After adding
> > datanodes, you should rebalance your hdfs data.
> >
> > -- Owen
> >
> >
> 
> Yeah thanks Owen that' s useful but what i wanted to know is how to
add a
> new remote machine although it s not listed in the conf/slaves file
and i
> just don' t understand how to do it without stopping the cluster or
the
> process running on it. (sorry for my english, it' s not my native
language
> so probably it sounds like i m a bit rude.. )
> 
> John
> --
> View this message in context:
http://www.nabble.com/dynamically-adding-
> slaves-to-hadoop-cluster-tp15943388p15945833.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread tjohn



Owen O'Malley-2 wrote:
> 
> 
> On Mar 9, 2008, at 9:54 AM, tjohn wrote:
> 
>>
>> Hi all, i m new to hadoop and i wanted to know how to dynamically  
>> add a slave
>> to my cluster, obviously while it' s running.
> 
> If you start a new data node (and/or task tracker), they will join  
> the cluster of the configured name node / job tracker. After adding  
> datanodes, you should rebalance your hdfs data.
> 
> -- Owen
> 
> 

Yeah thanks Owen that' s useful but what i wanted to know is how to add a
new remote machine although it s not listed in the conf/slaves file and i
just don' t understand how to do it without stopping the cluster or the
process running on it. (sorry for my english, it' s not my native language
so probably it sounds like i m a bit rude.. ) 

John
-- 
View this message in context: 
http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15945833.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Owen O'Malley


On Mar 9, 2008, at 9:54 AM, tjohn wrote:



Hi all, i m new to hadoop and i wanted to know how to dynamically  
add a slave

to my cluster, obviously while it' s running.


If you start a new data node (and/or task tracker), they will join  
the cluster of the configured name node / job tracker. After adding  
datanodes, you should rebalance your hdfs data.


-- Owen


dynamically adding slaves to hadoop cluster

2008-03-09 Thread tjohn

Hi all, i m new to hadoop and i wanted to know how to dynamically add a slave
to my cluster, obviously while it' s running.
Thanks in advance,

John

-- 
View this message in context: 
http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15943388.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.