Adding nodes

2012-03-01 Thread Mohit Anchlia
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:

http://wiki.apache.org/hadoop/FAQ

1. Update conf/slave
2. on the slave nodes start datanode and tasktracker
3. hadoop balancer

Do I also need to run dfsadmin -refreshnodes?


Re: Adding nodes

2012-03-01 Thread Joey Echeverria
You only have to refresh nodes if you're making use of an allows file. 

Sent from my iPhone

On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:

 Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:
 
 http://wiki.apache.org/hadoop/FAQ
 
 1. Update conf/slave
 2. on the slave nodes start datanode and tasktracker
 3. hadoop balancer
 
 Do I also need to run dfsadmin -refreshnodes?


Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com wrote:

 You only have to refresh nodes if you're making use of an allows file.

 Thanks does it mean that when tasktracker/datanode starts up it
communicates with namenode using master file?

Sent from my iPhone

 On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:

  Is this the right procedure to add nodes? I took some from hadoop wiki
 FAQ:
 
  http://wiki.apache.org/hadoop/FAQ
 
  1. Update conf/slave
  2. on the slave nodes start datanode and tasktracker
  3. hadoop balancer
 
  Do I also need to run dfsadmin -refreshnodes?



Re: Adding nodes

2012-03-01 Thread Joey Echeverria
Not quite. Datanodes get the namenode host from fs.defalt.name in 
core-site.xml. Task trackers find the job tracker from the mapred.job.tracker 
setting in mapred-site.xml. 

Sent from my iPhone

On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:

 On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com wrote:
 
 You only have to refresh nodes if you're making use of an allows file.
 
 Thanks does it mean that when tasktracker/datanode starts up it
 communicates with namenode using master file?
 
 Sent from my iPhone
 
 On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:
 
 Is this the right procedure to add nodes? I took some from hadoop wiki
 FAQ:
 
 http://wiki.apache.org/hadoop/FAQ
 
 1. Update conf/slave
 2. on the slave nodes start datanode and tasktracker
 3. hadoop balancer
 
 Do I also need to run dfsadmin -refreshnodes?
 


Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
The master and slave files, if I remember correctly are used to start the 
correct daemons on the correct nodes from the master node.


Raj



 From: Joey Echeverria j...@cloudera.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org 
Cc: common-user@hadoop.apache.org common-user@hadoop.apache.org 
Sent: Thursday, March 1, 2012 4:57 PM
Subject: Re: Adding nodes
 
Not quite. Datanodes get the namenode host from fs.defalt.name in 
core-site.xml. Task trackers find the job tracker from the mapred.job.tracker 
setting in mapred-site.xml. 

Sent from my iPhone

On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:

 On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com wrote:
 
 You only have to refresh nodes if you're making use of an allows file.
 
 Thanks does it mean that when tasktracker/datanode starts up it
 communicates with namenode using master file?
 
 Sent from my iPhone
 
 On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:
 
 Is this the right procedure to add nodes? I took some from hadoop wiki
 FAQ:
 
 http://wiki.apache.org/hadoop/FAQ
 
 1. Update conf/slave
 2. on the slave nodes start datanode and tasktracker
 3. hadoop balancer
 
 Do I also need to run dfsadmin -refreshnodes?
 




Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com wrote:

 Not quite. Datanodes get the namenode host from fs.defalt.name in
 core-site.xml. Task trackers find the job tracker from the
 mapred.job.tracker setting in mapred-site.xml.


I actually meant to ask how does namenode/jobtracker know there is a new
node in the cluster. Is it initiated by namenode when slave file is edited?
Or is it initiated by tasktracker when tasktracker is started?


 Sent from my iPhone

 On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:

  On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com
 wrote:
 
  You only have to refresh nodes if you're making use of an allows file.
 
  Thanks does it mean that when tasktracker/datanode starts up it
  communicates with namenode using master file?
 
  Sent from my iPhone
 
  On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:
 
  Is this the right procedure to add nodes? I took some from hadoop wiki
  FAQ:
 
  http://wiki.apache.org/hadoop/FAQ
 
  1. Update conf/slave
  2. on the slave nodes start datanode and tasktracker
  3. hadoop balancer
 
  Do I also need to run dfsadmin -refreshnodes?
 



Re: Adding nodes

2012-03-01 Thread anil gupta
Whatever Joey said is correct for Cloudera's distribution. For same, I am
not confident about other distribution as i haven't tried them.

Thanks,
Anil

On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan rajv...@yahoo.com wrote:

 The master and slave files, if I remember correctly are used to start the
 correct daemons on the correct nodes from the master node.


 Raj


 
  From: Joey Echeverria j...@cloudera.com
 To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Cc: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Sent: Thursday, March 1, 2012 4:57 PM
 Subject: Re: Adding nodes
 
 Not quite. Datanodes get the namenode host from fs.defalt.name in
 core-site.xml. Task trackers find the job tracker from the
 mapred.job.tracker setting in mapred-site.xml.
 
 Sent from my iPhone
 
 On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:
 
  On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com
 wrote:
 
  You only have to refresh nodes if you're making use of an allows file.
 
  Thanks does it mean that when tasktracker/datanode starts up it
  communicates with namenode using master file?
 
  Sent from my iPhone
 
  On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
  Is this the right procedure to add nodes? I took some from hadoop wiki
  FAQ:
 
  http://wiki.apache.org/hadoop/FAQ
 
  1. Update conf/slave
  2. on the slave nodes start datanode and tasktracker
  3. hadoop balancer
 
  Do I also need to run dfsadmin -refreshnodes?
 
 
 
 




-- 
Thanks  Regards,
Anil Gupta


Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
WHat Joey said is correct for both apache and cloudera distros. The DN/TT  
daemons  will connect to the NN/JT using the config files. The master and slave 
files are used for starting the correct daemons.




 From: anil gupta anilg...@buffalo.edu
To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com 
Sent: Thursday, March 1, 2012 5:42 PM
Subject: Re: Adding nodes
 
Whatever Joey said is correct for Cloudera's distribution. For same, I am
not confident about other distribution as i haven't tried them.

Thanks,
Anil

On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan rajv...@yahoo.com wrote:

 The master and slave files, if I remember correctly are used to start the
 correct daemons on the correct nodes from the master node.


 Raj


 
  From: Joey Echeverria j...@cloudera.com
 To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Cc: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Sent: Thursday, March 1, 2012 4:57 PM
 Subject: Re: Adding nodes
 
 Not quite. Datanodes get the namenode host from fs.defalt.name in
 core-site.xml. Task trackers find the job tracker from the
 mapred.job.tracker setting in mapred-site.xml.
 
 Sent from my iPhone
 
 On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:
 
  On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com
 wrote:
 
  You only have to refresh nodes if you're making use of an allows file.
 
  Thanks does it mean that when tasktracker/datanode starts up it
  communicates with namenode using master file?
 
  Sent from my iPhone
 
  On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
  Is this the right procedure to add nodes? I took some from hadoop wiki
  FAQ:
 
  http://wiki.apache.org/hadoop/FAQ
 
  1. Update conf/slave
  2. on the slave nodes start datanode and tasktracker
  3. hadoop balancer
 
  Do I also need to run dfsadmin -refreshnodes?
 
 
 
 




-- 
Thanks  Regards,
Anil Gupta




Re: Adding nodes

2012-03-01 Thread Arpit Gupta
It is initiated by the slave.If you have defined files to state which slaves can talk to the namenode (using configdfs.hosts) and which hosts cannot (using propertydfs.hosts.exclude) then you would need to edit these files and issue the refresh command.On Mar 1, 2012, at 5:35 PM, Mohit Anchlia wrote:On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com wrote:Not quite. Datanodes get the namenode host from fs.defalt.name incore-site.xml. Task trackers find the job tracker from themapred.job.tracker setting in mapred-site.xml.I actually meant to ask how does namenode/jobtracker know there is a newnode in the cluster. Is it initiated by namenode when slave file is edited?Or is it initiated by tasktracker when tasktracker is started?Sent from my iPhoneOn Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.comwrote:You only have to refresh nodes if you're making use of an allows file.Thanks does it mean that when tasktracker/datanode starts up itcommunicates with namenode using master file?Sent from my iPhoneOn Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:Is this the right procedure to add nodes? I took some from hadoop wikiFAQ:http://wiki.apache.org/hadoop/FAQ1. Update conf/slave2. on the slave nodes start datanode and tasktracker3. hadoop balancerDo I also need to run dfsadmin -refreshnodes?
--ArpitHortonworks, Inc.email: ar...@hortonworks.com



Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
Thanks all for the answers!!

On Thu, Mar 1, 2012 at 5:52 PM, Arpit Gupta ar...@hortonworks.com wrote:

 It is initiated by the slave.

 If you have defined files to state which slaves can talk to the namenode
 (using config dfs.hosts) and which hosts cannot (using
 property dfs.hosts.exclude) then you would need to edit these files and
 issue the refresh command.


  On Mar 1, 2012, at 5:35 PM, Mohit Anchlia wrote:

  On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com
 wrote:

 Not quite. Datanodes get the namenode host from fs.defalt.name in

 core-site.xml. Task trackers find the job tracker from the

 mapred.job.tracker setting in mapred-site.xml.



 I actually meant to ask how does namenode/jobtracker know there is a new
 node in the cluster. Is it initiated by namenode when slave file is edited?
 Or is it initiated by tasktracker when tasktracker is started?


 Sent from my iPhone


 On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:


  On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com

 wrote:


  You only have to refresh nodes if you're making use of an allows file.


  Thanks does it mean that when tasktracker/datanode starts up it

  communicates with namenode using master file?


  Sent from my iPhone


  On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:


   Is this the right procedure to add nodes? I took some from hadoop wiki

  FAQ:


   http://wiki.apache.org/hadoop/FAQ


   1. Update conf/slave

   2. on the slave nodes start datanode and tasktracker

   3. hadoop balancer


   Do I also need to run dfsadmin -refreshnodes?





 --
 Arpit
 Hortonworks, Inc.
 email: ar...@hortonworks.com

 http://www.hadoopsummit.org/
  http://www.hadoopsummit.org/
 http://www.hadoopsummit.org/



Re: Adding nodes

2012-03-01 Thread George Datskos

Mohit,

New datanodes will connect to the namenode so thats how the namenode 
knows.  Just make sure the datanodes have the correct {fs.default.dir} 
in their hdfs-site.xml and then start them.  The namenode can, however, 
choose to reject the datanode if you are using the {dfs.hosts} and 
{dfs.hosts.exclude} settings in the namenode's hdfs-site.xml.


The namenode doesn't actually care about the slaves file.  It's only 
used by the start/stop scripts.



On 2012/03/02 10:35, Mohit Anchlia wrote:

I actually meant to ask how does namenode/jobtracker know there is a new
node in the cluster. Is it initiated by namenode when slave file is edited?
Or is it initiated by tasktracker when tasktracker is started?






Re: Dynamically adding nodes in Hadoop

2012-01-03 Thread madhu phatak
Thanks for all the input. I am trying to do cluster setup in EC2 but not
able to find how i can do dns updation centrally. If anyone one knows how
to do this please help me ..

On Sat, Dec 17, 2011 at 8:10 PM, Michel Segel michael_se...@hotmail.comwrote:

 Actually I would recommend avoiding /etc/hosts and using DNS if this is
 going to be a production grade cluster...

 Sent from a remote device. Please excuse any typos...

 Mike Segel

 On Dec 17, 2011, at 5:40 AM, alo alt wget.n...@googlemail.com wrote:

  Hi,
 
  in the slave - file too. /etc/hosts is also recommend to avoid DNS
  issues. After adding in slaves the new node has to be started and
  should quickly appear in the web-ui. If you don't need the nodes all
  time you can setup a exclude and refresh your cluster
  (
 http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
 )
 
  - Alex
 
  On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak phatak@gmail.com
 wrote:
  Hi,
   I am trying to add nodes dynamically to a running hadoop cluster.I
 started
  tasktracker and datanode in the node. It works fine. But when some node
  try fetch values ( for reduce phase) it fails with unknown host
 exception.
  When i add a node to running cluster do i have to add its hostname to
 all
  nodes (slaves +master) /etc/hosts file? Or some other way is there?
 
 
  --
  Join me at http://hadoopworkshop.eventbrite.com/
 
 
 
  --
  Alexander Lorenz
  http://mapredit.blogspot.com
 
  P Think of the environment: please don't print this email unless you
  really need to.
 




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Dynamically adding nodes in Hadoop

2011-12-17 Thread madhu phatak
Hi,
 I am trying to add nodes dynamically to a running hadoop cluster.I started
tasktracker and datanode in the node. It works fine. But when some node
try fetch values ( for reduce phase) it fails with unknown host exception.
When i add a node to running cluster do i have to add its hostname to all
nodes (slaves +master) /etc/hosts file? Or some other way is there?


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread Harsh J
Madhu,

On Sat, Dec 17, 2011 at 4:36 PM, madhu phatak phatak@gmail.com wrote:
 When i add a node to running cluster do i have to add its hostname to all
 nodes (slaves +master) /etc/hosts file?

Yes.

 Or some other way is there?

You can run a DNS, and have the resolution centrally managed.

-- 
Harsh J


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread alo alt
Hi,

in the slave - file too. /etc/hosts is also recommend to avoid DNS
issues. After adding in slaves the new node has to be started and
should quickly appear in the web-ui. If you don't need the nodes all
time you can setup a exclude and refresh your cluster
(http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F)

- Alex

On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak phatak@gmail.com wrote:
 Hi,
  I am trying to add nodes dynamically to a running hadoop cluster.I started
 tasktracker and datanode in the node. It works fine. But when some node
 try fetch values ( for reduce phase) it fails with unknown host exception.
 When i add a node to running cluster do i have to add its hostname to all
 nodes (slaves +master) /etc/hosts file? Or some other way is there?


 --
 Join me at http://hadoopworkshop.eventbrite.com/



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread Michel Segel
Actually I would recommend avoiding /etc/hosts and using DNS if this is going 
to be a production grade cluster...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 17, 2011, at 5:40 AM, alo alt wget.n...@googlemail.com wrote:

 Hi,
 
 in the slave - file too. /etc/hosts is also recommend to avoid DNS
 issues. After adding in slaves the new node has to be started and
 should quickly appear in the web-ui. If you don't need the nodes all
 time you can setup a exclude and refresh your cluster
 (http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F)
 
 - Alex
 
 On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak phatak@gmail.com wrote:
 Hi,
  I am trying to add nodes dynamically to a running hadoop cluster.I started
 tasktracker and datanode in the node. It works fine. But when some node
 try fetch values ( for reduce phase) it fails with unknown host exception.
 When i add a node to running cluster do i have to add its hostname to all
 nodes (slaves +master) /etc/hosts file? Or some other way is there?
 
 
 --
 Join me at http://hadoopworkshop.eventbrite.com/
 
 
 
 -- 
 Alexander Lorenz
 http://mapredit.blogspot.com
 
 P Think of the environment: please don't print this email unless you
 really need to.
 


After adding nodes to 0.20.2 cluster, getting Could not complete file errors and hung JobTracker

2010-10-15 Thread Bobby Dennett
Hi all,

We are currently in the process of replacing the servers in our Hadoop
0.20.2 production cluster and in the last couple of days have
experienced an error similar to the following (from the JobTracker
log) several times, which then appears to hang the JobTracker:

2010-10-15 04:13:38,980 INFO org.apache.hadoop.mapred.JobInProgress:
Job job_201010140844_0510 has completed successfully.
2010-10-15 04:13:44,192 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:44,592 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:44,993 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:45,393 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:45,794 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/history/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...

We haven't seen an issue like this until we added 6 new nodes to our
existing 65 node cluster. The only other configuration change made
recently was to setup include/exclude files for DFS and MapReduce to
enable Hadoop's node decommissioning functionality.

Once we encounter this issue (which has happened twice in the last 24
hours), we end up needing to restart the MapReduce processes which we
cannot do on a frequent basis. After the last occurrence, I increased
the value of the mapred.job.tracker.handler.count to 60 and am waiting
to see if it has an impact.

Has anyone else seen this behavior before? Are there any
recommendations for trying to prevent this from happening in the
future?

Thanks in advance,
-Bobby