*Step wise Details (Ubantu 10.x version ): Go through properly and Run one by one. it will sove your problem (You can change the path,IP ,Host name as you like to do)* --------------------------------------------------------------------------------------------------------- 1. Start the terminal
2. Disable ipv6 on all machines pico /etc/sysctl.conf 10. Download and install hadoop: 3. Add these files to the EOF cd /usr/local/hadoop net.ipv6.conf.all.disable_ipv6 = 1 sudo wget –c http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u2.tar.gz net.ipv6.conf.default.disable_ipv6 = 1 11. Unzip the tar net.ipv6.conf.lo.disable_ipv6 = 1 sudo tar -zxvf /usr/local/hadoop/hadoop-0.20.2-chd3u2.tar.gz net.ipv6.conf.lo.disable_ipv6 = 1 12. Change permissions on hadoop folder by granting all to hadoop 3. Reboot the system sudo chown -R hadoop:hadoop /usr/local/hadoop sudo reboot sudo chmod 750 -R /usr/local/hadoop 4. Install java 13. Create the HDFS directory sudo apt-get install openjdk-6-jdk openjdk-6-jre sudo mkdir hadoop-datastore // inside the usr local hadoop folder 5. Check if ssh is installed, if not do so: sudo mkdir hadoop-datastore/hadoop-hadoop sudo apt-get install openssh-server openssh-client 14. Add the binaries path and hadoop home in the environment file 6. Create a group and user called hadoop sudo pico /etc/environment sudo addgroup hadoop set the bin path as well as hadoop home path sudo adduser --ingroup hadoop hadoop source /etc/environment 7. Assign all the permissions to the Hadoop user 15. Configure the hadoop env.sh file sudo visudo cd /usr/local/hadoop/hadoop-0.20.2-cdh3u3/ Add the following line in the file sudo pico conf/hadoop-env.sh hadoop ALL =(ALL) ALL add the following line in there: 8. Check if hadoop user has ssh installed export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true su hadoop export JAVA_HOME="/usr/lib/jvm/java-6-openjdk ssh-keygen -t rsa -P "" <next page> Press Enter when asked. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost Copy the servers RSA public key from server to all nodes in the authorized_keys file as shown in the above step 9. Make hadoop installation directory: sudo mkdir /usr/local/ 10. Download and install hadoop: cd /usr/local/hadoop sudo wget –c http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u2.tar.gz 11. Unzip the tar sudo tar -zxvf /usr/local/hadoop/hadoop-0.20.2-chd3u2.tar.gz 12. Change permissions on hadoop folder by granting all to hadoop sudo chown -R hadoop:hadoop /usr/local/hadoop sudo chmod 750 -R /usr/local/hadoop 13. Create the HDFS directory sudo mkdir hadoop-datastore // inside the usr local hadoop folder sudo mkdir hadoop-datastore/hadoop-hadoop 14. Add the binaries path and hadoop home in the environment file sudo pico /etc/environment // set the bin path as well as hadoop home path source /etc/environment 15. Configure the hadoop env.sh file cd /usr/local/hadoop/hadoop-0.20.2-cdh3u3/ sudo pico conf/hadoop-env.sh //add the following line in there: export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export JAVA_HOME="/usr/lib/jvm/java-6-openjdk 16. Configuring the core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://<IP of namenode>:54310</value> <description>Location of the Namenode</description> </property> </configuration> 17. Configuring the hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication.</description> </property> </configuration> 18. Configuring the mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value><IP of job tracker>:54311</value> <description>Host and port of the jobtracker. </description> </property> </configuration> 19. Add all the IP addresses in the conf/slaves file sudo pico /usr/local/hadoop/hadoop-0.20.2-cdh3u2/conf/slaves Add the list of IP addresses that will host data nodes, in this file --------------------------------------------------------------------------------------------------------------------------------------------- *Hadoop Commands: Now restart the hadoop cluster* start-all.sh/stop-all.sh start-dfs.sh/stop-dfs.sh start-mapred.sh/stop-mapred.sh hadoop dfs -ls /<virtual dfs path> hadoop dfs copyFromLocal <local path> <dfs path>