Does hadoop installations need to be at same locations in cluster ?
When installing hadoop on slave machines, do we have to install hadoop at same locations on each machine ? Can we have hadoop installation at different location on different machines at same cluster ? If yes, what things we have to take care in that case Thanks, Praveenesh
Re: Does hadoop installations need to be at same locations in cluster ?
What I mean to say is, Does hadoop internally assumes that all installations on each nodes need to be in same location. I was having hadoop installed on different location on 2 different nodes. I configured hadoop config files to be a part of same cluster. But when I started hadoop on master, I saw it was also searching for hadoop starting scripts in the same location as of master. Do we have any workaround in these kind of situation or do I have to reinstall hadoop again on same location as master. Thanks, Praveenesh On Fri, Dec 23, 2011 at 6:26 PM, Michael Segel michael_se...@hotmail.com wrote: Sure, You could do that, but in doing so, you will make your life a living hell. Literally. Think about it... You will have to manually manage each nodes config files... So if something goes wrong you will have a hard time diagnosing the issue. Why make life harder? Why not just do the simple think and make all of your DN the same? Sent from my iPhone On Dec 23, 2011, at 6:51 AM, praveenesh kumar praveen...@gmail.com wrote: When installing hadoop on slave machines, do we have to install hadoop at same locations on each machine ? Can we have hadoop installation at different location on different machines at same cluster ? If yes, what things we have to take care in that case Thanks, Praveenesh
How Jobtracker choose DataNodes to run TaskTracker ?
Okay so I have one question in mind. Suppose I have a replication factor of 3 on my cluster of some N nodes, where N3 and there is a data block B1 that exists on some 3 Data nodes -- DD1, DD2, DD3. I want to run some Mapper function on this block.. My JT will communicate with NN, to know where can he find the block. My assumption is NN will give JT all the Data node information where the block resides, in this case - DD1, DD2,DD3. Am I right on this ? Now my question is how JT will come to know on which DD it should send its mapper code ? Suppose it chose DD1, and my tasktracker starts running on that machine. By some reasons, DD1 is taking more time than it should have taken time when it would be running on DD2. How hadoop understand and take these decisions ? Thanks, Praveenesh
More cores Vs More Nodes ?
Hey Guys, So I have a very naive question in my mind regarding Hadoop cluster nodes ? more cores or more nodes – Shall I spend money on going from 2-4 core machines, or spend money on buying more nodes less core eg. say 2 machines of 2 cores for example? Thanks, Praveenesh
Hive on hadoop 0.20.205
Did anyone tried HIVE on Hadoop 0.20.205. I am trying to build HIVE from svn. but I am seeing its downloading hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz. If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build is failing. Any ideas or suggestion on what I may be doing wrong ? Thanks, Praveenesh
Re: Hive on hadoop 0.20.205
/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java:52: error: HivePreparedStatement is not abstract and does not override abstract method isCloseOnCompletion() in Statement [javac] public class HivePreparedStatement implements PreparedStatement { [javac]^ [javac] /usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java:47: error: HiveQueryResultSet is not abstract and does not override abstract method TgetObject(String,ClassT) in ResultSet [javac] public class HiveQueryResultSet extends HiveBaseResultSet { [javac]^ [javac] where T is a type-variable: [javac] T extends Object declared in method TgetObject(String,ClassT) [javac] /usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java:33: error: HiveStatement is not abstract and does not override abstract method isCloseOnCompletion() in Statement [javac] public class HiveStatement implements java.sql.Statement { [javac]^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: /usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 14 errors BUILD FAILED /usr/local/hadoop/hive/release-0.7.1/build.xml:196: The following error occurred while executing this line: /usr/local/hadoop/hive/release-0.7.1/build.xml:130: The following error occurred while executing this line: /usr/local/hadoop/hive/release-0.7.1/jdbc/build.xml:51: Compile failed; see the compiler error output for details. Total time: 29 minutes 46 seconds Thanks, Praveenesh On Fri, Dec 9, 2011 at 2:08 PM, praveenesh kumar praveen...@gmail.comwrote: Did anyone tried HIVE on Hadoop 0.20.205. I am trying to build HIVE from svn. but I am seeing its downloading hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz. If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build is failing. Any ideas or suggestion on what I may be doing wrong ? Thanks, Praveenesh
Re: HDFS Backup nodes
This means still we are relying on Secondary NameNode idealogy for Namenode's backup. Can OS-mirroring of Namenode is a good alternative keep it alive all the time ? Thanks, Praveenesh On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: AFAIK backup node introduced in 0.21 version onwards. From: praveenesh kumar [praveen...@gmail.com] Sent: Wednesday, December 07, 2011 12:40 PM To: common-user@hadoop.apache.org Subject: HDFS Backup nodes Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh
Warning: $HADOOP_HOME is deprecated
How to avoid Warning: $HADOOP_HOME is deprecated messages on hadoop 0.20.205 ? I tried adding *export HADOOP_HOME_WARN_SUPPRESS= *in hadoop-env.sh on Namenode. But its still coming. Am I doing the right thing ? Thanks, Praveenesh
Re: Warning: $HADOOP_HOME is deprecated
Okay, I fixed it. I have to add *export HADOOP_HOME_WARN_SUPPRESS=TRUE* in hadoop-env.sh on all my hadoop nodes. Thanks, Praveenesh On Wed, Dec 7, 2011 at 4:11 PM, alo alt wget.n...@googlemail.com wrote: Hi, looks like a bug in .205: https://issues.apache.org/jira/browse/HADOOP-7816 - Alex On Wed, Dec 7, 2011 at 11:37 AM, praveenesh kumar praveen...@gmail.com wrote: How to avoid Warning: $HADOOP_HOME is deprecated messages on hadoop 0.20.205 ? I tried adding *export HADOOP_HOME_WARN_SUPPRESS= *in hadoop-env.sh on Namenode. But its still coming. Am I doing the right thing ? Thanks, Praveenesh -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
HDFS Backup nodes
Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh
Utilizing multiple hard disks for hadoop HDFS ?
Hi everyone, So I have this blade server with 4x500 GB hard disks. I want to use all these hard disks for hadoop HDFS. How can I achieve this target ? If I install hadoop on 1 hard disk and use other hard disk as normal partitions eg. - /dev/sda1, -- HDD 1 -- Primary partition -- Linux + Hadoop installed on it /dev/sda2, -- HDD 2 -- Mounted partition -- /mnt/dev/sda2 /dev/sda3, -- HDD3 -- Mounted partition -- /mnt/dev/sda3 /dev/sda4, -- HDD4 -- Mounted partition -- /mnt/dev/sda4 And if I create a hadoop.tmp.dir on each partition say -- /tmp/hadoop-datastore/hadoop-hadoop and on core-site.xml, if I configure like -- property namehadoop.tmp.dir/name value/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda2/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda3/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda4/tmp/hadoop-datastore/hadoop-hadoop/value descriptionA base for other temporary directories./description /property Will it work ?? Can I set the above property for dfs.data.dir also ? Thanks, Praveenesh
Hadoop 0.20.205
Hi all, Any Idea, when is hadoop 0.20.205 is officially going to release ? Is Hadoop-0.20.205 rc2 stable enough to start into production ? I am using hadoop-0.20-append now with hbase 0.90.3, want to switch to 205. But looking for some valubale suggestions/recommendations ? Thanks, Praveenesh
Re: Too much fetch failure
try commenting 127.0.0.1 localhost line in your /etc/hosts and then restart the cluster and then try again. Thanks, Praveenesh On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.comwrote: we are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ From the above comments , there are many users who faced this problem. different users suggested to modify the /etc/hosts file in different manner to fix the problem. but there is no ultimate solution.we need the actual solution thats why we are writing here. this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave
Re: Too much fetch failure
why are you formatting the namenode again ? 1. Just stop the cluster.. 2. Just comment out the 127.0.0.1 localhost line 3. Restart the cluster. How have you defined your hadoop config files..? Have you mentioned localhost there ? Thanks, Praveenesh On Sun, Oct 16, 2011 at 7:42 PM, Humayun gmail humayun0...@gmail.comwrote: commenting the line 127.0.0.1 in /etc/hosts is not working. if i format the namenode then automatically this line is added. any other solution? On 16 October 2011 19:13, praveenesh kumar praveen...@gmail.com wrote: try commenting 127.0.0.1 localhost line in your /etc/hosts and then restart the cluster and then try again. Thanks, Praveenesh On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.com wrote: we are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ From the above comments , there are many users who faced this problem. different users suggested to modify the /etc/hosts file in different manner to fix the problem. but there is no ultimate solution.we need the actual solution thats why we are writing here. this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave
Re: Error using hadoop distcp
I tried that thing also.. when I am using IP address, its saying I should use hostname. *hadoop@ub13:~$ hadoop distcp hdfs://162.192.100.53:54310/user/hadoop/webloghdfs:// 162.192.100.16:54310/user/hadoop/weblog* 11/10/05 14:53:50 INFO tools.DistCp: srcPaths=[hdfs:// 162.192.100.53:54310/user/hadoop/weblog] 11/10/05 14:53:50 INFO tools.DistCp: destPath=hdfs:// 162.192.100.16:54310/user/hadoop/weblog java.lang.IllegalArgumentException: Wrong FS: hdfs:// 162.192.100.53:54310/user/hadoop/weblog, expected: hdfs://ub13:54310 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:464) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:621) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:638) at org.apache.hadoop.tools.DistCp.run(DistCp.java:857) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) I have the entries of both machines in /etc/hosts... On Wed, Oct 5, 2011 at 1:55 PM, bejoy.had...@gmail.com wrote: Hi praveenesh Can you try repeating the distcp using IP instead of host name. From the error looks like an RPC exception not able to identify the host, so I believe it can't be due to not setting a password less ssh. Just try it out. Regards Bejoy K S -Original Message- From: trang van anh anh...@vtc.vn Date: Wed, 05 Oct 2011 14:06:11 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Re: Error using hadoop distcp which host run the task that throws the exception ? ensure that each data node know another data nodes in hadoop cluster- add ub16 entry in /etc/hosts on where the task running. On 10/5/2011 12:15 PM, praveenesh kumar wrote: I am trying to use distcp to copy a file from one HDFS to another. But while copying I am getting the following exception : hadoop distcp hdfs://ub13:54310/user/hadoop/weblog hdfs://ub16:54310/user/hadoop/weblog 11/10/05 10:41:01 INFO mapred.JobClient: Task Id : attempt_201110031447_0005_m_07_0, Status : FAILED java.net.UnknownHostException: unknown host: ub16 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:195) at org.apache.hadoop.ipc.Client.getConnection(Client.java:850) at org.apache.hadoop.ipc.Client.call(Client.java:720) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:215) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:177) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124) at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296) at org.apache.hadoop.mapred.Child.main(Child.java:170) Its saying its not finding ub16. But the entry is there in /etc/hosts files. I am able to ssh both the machines. Do I need password less ssh between these two NNs ? What can be the issue ? Any thing I am missing before using distcp ? Thanks, Praveenesh
Error using hadoop distcp
I am trying to use distcp to copy a file from one HDFS to another. But while copying I am getting the following exception : hadoop distcp hdfs://ub13:54310/user/hadoop/weblog hdfs://ub16:54310/user/hadoop/weblog 11/10/05 10:41:01 INFO mapred.JobClient: Task Id : attempt_201110031447_0005_m_07_0, Status : FAILED java.net.UnknownHostException: unknown host: ub16 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:195) at org.apache.hadoop.ipc.Client.getConnection(Client.java:850) at org.apache.hadoop.ipc.Client.call(Client.java:720) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:215) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:177) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124) at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296) at org.apache.hadoop.mapred.Child.main(Child.java:170) Its saying its not finding ub16. But the entry is there in /etc/hosts files. I am able to ssh both the machines. Do I need password less ssh between these two NNs ? What can be the issue ? Any thing I am missing before using distcp ? Thanks, Praveenesh
Is SAN storage is a good option for Hadoop ?
Hi, I want to know can we use SAN storage for Hadoop cluster setup ? If yes, what should be the best pratices ? Is it a good way to do considering the fact the underlining power of Hadoop is co-locating the processing power (CPU) with the data storage and thus it must be local storage to be effective. *But also, is it better to say “local is better” in the situation where I have a single local 5400 RPM IDE drive, which would be dramatically slower than SAN storage striped across many drives spinning at 10k RPM and accessed via fiber channel ?* * * Thanks, Praveenesh
hadoop question using VMWARE
Hi, Suppose I am having 10 windows machines and if I have 10 VM individual instances running on these machines independently, can I use these VM instances to communicate with each other so that I can make hadoop cluster using those VM instances. Did anyone tried that thing ? I know we can setup multiple VM instances on same machine, but can we do it across different machines also ? And if I do like this, Is it a good approach, considering I don't have dedicated ubuntu machines for hadoop ? Thanks, Praveenesh
Re: hadoop question using VMWARE
it's not something you can do for production nor performance analysis. Can you please tell me what does it mean ? Why Can't we use this approach for production ??? Thanks On Tue, Sep 27, 2011 at 11:56 PM, N Keywal nkey...@gmail.com wrote: Hi, Yes, it will work. HBase won't see the difference, it's a pure vmware stuff. Obviously, it's not something you can do for production nor performance analysis. Cheers, N. On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, Suppose I am having 10 windows machines and if I have 10 VM individual instances running on these machines independently, can I use these VM instances to communicate with each other so that I can make hadoop cluster using those VM instances. Did anyone tried that thing ? I know we can setup multiple VM instances on same machine, but can we do it across different machines also ? And if I do like this, Is it a good approach, considering I don't have dedicated ubuntu machines for hadoop ? Thanks, Praveenesh
How to run java code using Mahout from commandline ?
Hey, I have this code written using mahout. I am able to run the code from eclipse How can I run the code written in mahout from command line ? My question is do I have to make a jar file and run it as hadoop jar jarfilename.jar class or shall I run it using simple java command ? Can anyone solve my confusion ? I am not able to run this code. Thanks, Praveenesh
Re: Can we replace namenode machine with some other machine ?
But apart from storing metadata info, Is there anything more NN/JT machines are doing ?? . So I can say I can survive with poor NN if I am not dealing with lots of files in HDFS ? On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: By just changing the configs will not effect your data. You need to restart your DNs to connect to new NN. For the second question: It will again depends on your usage. If your files will more in DFS then NN will consume more memory as it needs to store all the metadata info of the files in NameSpace. If your files are more and more then it is recommended that dont put the NN and JT in same machine. Coming to DN case: Configured space will used for storing the block files.Once it is filled the space then NN will not select this DN for further writes. So, if one DN has less space should fine than less space for NN in big clusters. Configuring good configuration DN which has very good amount of space. And NN has less space to store your files metadata info then its of no use to have more space in DNs right :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:42 am Subject: Re: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org If I just change configuration settings in slave machines, Will it effectany of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actually my master node is a weaker machine than my slave machines,because I am assuming that master machines does not do much additional work, and its okay to have a weak machine as master. Now I have a new big server machine just being added to my cluster. So I am thinking shall I make this new machine as my new master(NN/JT) or just add this machine as slave ? Thanks, Praveenesh On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh
Any other way to copy to HDFS ?
Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's disk capacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Fwd: Any other way to copy to HDFS ?
is running on hdfs://10.18.52.63:9000 Then you can caonnect to your NameNode like below. FileSystem fs =new DistributedFileSystem(); fs.initialize(new URI(hdfs://10.18.52.63:9000/), new Configuration()); Please go through the below mentioned docs, you will more understanding. if I want to copy data from windows machine to namenode machine ? In DFS namenode will be responsible for only nameSpace. in simple words to understand quickly the flow: Clients will ask NameNode to give some DNs to copy the data. Then NN will create file entry in NameSpace and also will return the block entries based on client request. Then clients directly will connect to the DNs and copy the data. Reading data back also will the sameway. I hope you will understand better now :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:11 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Fwd: Any other way to copy to HDFS ?
Thanks a lot..!! I guess I can play around with the permissions of dfs for a while. On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hello Praveenesh, If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable You can the permission for the path before creating as well. from docs: Changes to the File System API All methods that use a path parameter will throw AccessControlException if permission checking fails. New methods: public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException; public boolean mkdirs(Path f, FsPermission permission) throws IOException; public void setPermission(Path p, FsPermission permission) throws IOException; public void setOwner(Path p, String username, String groupname) throws IOException; public FileStatus getFileStatus(Path f) throws IOException; will additionally return the user, group and mode associated with the path. http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:41 pm Subject: Fwd: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Thanks a lot. I am trying to run the following code on my windows machinethat is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI(hdfs://162.192.100.53:54310/), *new*Configuration()); fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path( /user/hadoop/Positive.txt)); System.*out*.println(Done); } But I am getting the following exception : Exception in thread main org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416
Can we run job on some datanodes ?
Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh
Re: Can we run job on some datanodes ?
Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I can decrease the strength of my cluster by decreasing the number of nodes that will run tasktracker .. right ?? This way I won't loose my data also.. Right ?? On Wed, Sep 21, 2011 at 6:39 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you service hadoop-0.20-tasktracker stop or hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't run there anymore. Is this what you're looking for? (There are ways to achieve the exclusion dynamically, by writing a scheduler, but hard to tell without knowing what you need specifically, and why do you require it?) On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com wrote: Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh -- Harsh J
Re: Can we replace namenode machine with some other machine ?
If I just change configuration settings in slave machines, Will it effect any of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actually my master node is a weaker machine than my slave machines,because I am assuming that master machines does not do much additional work, and its okay to have a weak machine as master. Now I have a new big server machine just being added to my cluster. So I am thinking shall I make this new machine as my new master(NN/JT) or just add this machine as slave ? Thanks, Praveenesh On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh
Re: Multiple Mappers and One Reducer
Harsh, Can you please tell how can we use MultipleInputs using Job Object on hadoop 0.20.2. As you can see, in MultipleInputs, its using JobConf object. I want to use Job object as mentioned in new hadoop 0.21 API. I remember you talked about pulling out things from new API and add it into out project. Can you please add more light how can we do this ? Thanks , Praveenesh. On Wed, Sep 7, 2011 at 2:57 AM, Harsh J ha...@cloudera.com wrote: Sahana, Yes this is possible as well. Please take a look at the MultipleInputs API @ http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html It will allow you to add a path each with its own mapper implementation, and you can then have a common reducer since the key is what you'll be matching against. On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat sana.b...@gmail.com wrote: Hi, I understand that given a file, the file is split across 'n' mapper instances, which is the normal case. The scenario i have is : 1. Two files which are not totally identical in terms of number of columns (but have data that is similar in a few columns) need to be processed and after computation a single output file has to be generated. Note : CV - computedvalue File1 belonging to one dataset has data for : Date,counter1,counter2, CV1,CV2 File2 belonging to another dataset has data for : Date,counter1,counter2,CV3,CV4,CV5 Computation to be carried out on these two files is : CV6 =(CV1*CV5)/100 And the final emitted output file should have data in the sequence: Date,counter1,counter2,CV6 The idea is to have two mappers (not instances) run on each of the file, and a single reducer that emits the final result file. Thanks, Sahana On Wed, Sep 7, 2011 at 2:40 PM, Harsh J ha...@cloudera.com wrote: Sahana, Yes. But, isn't that how it is normally? What makes you question this capability? On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat sana.b...@gmail.com wrote: Hi, Is it possible to have multiple mappers where each mapper is operating on a different input file and whose result (which is a key value pair from different mappers) is processed by a single reducer? Regards, Sahana -- Harsh J -- Harsh J
Re: MultipleInputs in hadoop 0.20.2
FWIW the trunk/future-branches have new API MultipleInputs you can pull and include in your project Can anyone please tell me how I can do the above thing. How can I use MultipleInputs of higher hadoop version to use it in lower hadoop version. Thanks On Wed, Aug 24, 2011 at 5:50 PM, Harsh J ha...@cloudera.com wrote: 0.20.x supports the older API and it has been 're-deemed' as the stable one. You shouldn't face any hesitation in using it as even 0.23 would carry it (although there its properly deprecated). This is quite some confusion but I guess you still won't have some of the old API features in the new one. FWIW the trunk/future-branches have new API MultipleInputs you can pull and include in your project. Also, alternative distributions that do stable backports may carry MultipleInputs in the new API (I use CDH3 and it does have mapreduce.lib.input.MultipleInputs backported in it). On Wed, Aug 24, 2011 at 2:40 PM, praveenesh kumar praveen...@gmail.com wrote: Hello guys, I am looking to use MultipleInputs.addInputPath() method in hadoop 0.20.2. But when I am looking to its signature in the API, its like this : * public static void addInputPath(JobConf conf, Path path, Class? extends InputFormat inputFormatClass)* * public static void addInputPath(JobConf conf, Path path, Class? extends InputFormat inputFormatClass, Class? extends Mapper mapperClass)* But as far as I know in hadoop 0.20.2, JobConf object is deprecated. How can I use MultipleInputs.addInputPath() in hadoop. Is there any other way or any new class introduced instead of this one. Thanks, Praveenesh -- Harsh J
MultipleInputs in hadoop 0.20.2
Hello guys, I am looking to use MultipleInputs.addInputPath() method in hadoop 0.20.2. But when I am looking to its signature in the API, its like this : * public static void addInputPath(JobConf conf, Path path, Class? extends InputFormat inputFormatClass)* * public static void addInputPath(JobConf conf, Path path, Class? extends InputFormat inputFormatClass, Class? extends Mapper mapperClass)* But as far as I know in hadoop 0.20.2, JobConf object is deprecated. How can I use MultipleInputs.addInputPath() in hadoop. Is there any other way or any new class introduced instead of this one. Thanks, Praveenesh
YCSB Benchmarking for HBase
Hi, Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ?? I am trying to run it, its giving me error: $ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db com.yahoo.ycsb.db.HBaseClient YCSB Command Line client Type help for command line help Start with -help for usage info Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406) at java.lang.Class.getConstructor0(Class.java:2716) at java.lang.Class.newInstance0(Class.java:343) at java.lang.Class.newInstance(Class.java:325) at com.yahoo.ycsb.CommandLine.main(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 6 more By the error, it seems like its not able to get Hadoop-core.jar file, but its already in the class path. Has anyone worked on YCSB with hbase ? Thanks, Praveenesh
Giving filename as key to mapper ?
Hi, How can I give filename as key to mapper ? I want to know the occurence of word in set of docs, so I want to keep key as filename. Is it possible to give input key as filename in map function ? Thanks, Praveenesh
Re: Giving filename as key to mapper ?
I am new to this hadoop API. Can anyone give me some tutorial or code snipet on how to write your own input format to do these kind of things. Thanks. On Fri, Jul 15, 2011 at 8:07 PM, Robert Evans ev...@yahoo-inc.com wrote: To add to that if you really want the file name to be the key instead of just calling a different API in your map to get it you will probably need to write your own input format to do it. It should be fairly simple and you can base it off of an existing input format to do it. --Bobby On 7/15/11 7:40 AM, Harsh J ha...@cloudera.com wrote: You can retrieve the filename in the new API as described here: http://search-hadoop.com/m/ZOmmJ1PZJqt1/map+input+filenamesubj=Retrieving+Filename In the old API, its available in the configuration instance of the mapper as key map.input.file. See the table below this section http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse for more such goodies. On Fri, Jul 15, 2011 at 5:44 PM, praveenesh kumar praveen...@gmail.com wrote: Hi, How can I give filename as key to mapper ? I want to know the occurence of word in set of docs, so I want to keep key as filename. Is it possible to give input key as filename in map function ? Thanks, Praveenesh -- Harsh J
Re: How hadoop parse input files into (Key,Value) pairs ??
Hi, So I have a file in which the records are comma separated ( Record1,Record2). I want to make first record (Record1 as key) and Record2 as value. I am using hadoop 0.20-append version. I am looking forward to use KeyValueTextInputFormat and then setting key.value.separator.in.input.line with ,.Is it possible with hadoop-0.20-append ? I am not able to do that. Any help ? Thanks, Praveenesh On Mon, May 23, 2011 at 3:45 AM, Mark question markq2...@gmail.com wrote: The case your talking about is when you use FileInputFormat ... So usually the InputFormat Interface is the one responsible for that. For FileInputFormat, it uses a LineRecordReader which will take your text file and assigns key to be the offset within your text file and value to be the line (until '\n') is seen. If you want to use other InputFormats check its API and pick what is suitable for you. In my case, I'm hocked with SequenceFileInputFormat where my input files are key,value records written by a regular java program (or parser). Then my Hadoop job will look at the keys and values that I wrote. I hope this helps a little, Mark On Thu, May 5, 2011 at 4:31 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, As we know hadoop mapper takes input as (Key,Value) pairs and generate intermediate (Key,Value) pairs and usually we give input to our Mapper as a text file. How hadoop understand this and parse our input text file into (Key,Value) Pairs Usually our mapper looks like -- *public* *void* map(LongWritable key, Text value,OutputCollectorText, Text outputCollector, Reporter reporter) *throws* IOException { String word = value.toString(); //Some lines of code } So if I pass any text file as input, it is taking every line as VALUE to Mapper..on which I will do some processing and put it to OutputCollector. But how hadoop parsed my text file into ( Key,Value ) pair and how can we tell hadoop what (key,value) it should give to mapper ?? Thanks.
Re: Does hadoop-0.20-append compatible with PIG 0.8 ?
Hi, There is no hadoop jar in my pig lib directory. I tried copying my hadoop jar files in the pig lib folder. Also I tried adding that jar file in the pig lib path.. still the error is same. Is there any other way to make it run with hadoop-0.20-append version. Guys, I am stuck with this issue. Need your guidance. Thanks, Praveenesh On Sat, Jul 2, 2011 at 1:36 PM, Joey Echeverria j...@cloudera.com wrote: Try replacing the hadoop jar from the pig lib directory with the one from your cluster. -Joey On Jul 2, 2011, at 0:38, praveenesh kumar praveen...@gmail.com wrote: Hi guys.. I am previously using hadoop and Hbase... So for Hbase to run perfectly fine we need Hadoop-0.20-append for Hbase jar files.. So I am using Hadoop-0.20-append jar files.. which made both my hadoop and hbase to work fine.. Now I want to use pig for my hadoop and hbase clusters.. I downloaded pig 0.8.0... and configured pig to run in map-reduce mode by setting the pig_classpath to point to the $HADOOP_HOME/conf directory. Then running ‘pig’ gives the following error messeage. hadoop@ub13:/usr/local/pig/bin$ pig 2011-07-01 17:41:52,150 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/pig/bin/pig_1309522312144.log 2011-07-01 17:41:52,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ub13:54310 2011-07-01 17:41:52,654 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage LOG MESSAGE - Error before Pig is launched--- ERROR 2999: Unexpected internal error. Failed to create DataStorage java.lang.RuntimeException: Failed to create DataStorage at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:58) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134) at org.apache.pig.impl.PigContext.connect(PigContext.java:183) at org.apache.pig.PigServer.init(PigServer.java:226) at org.apache.pig.PigServer.init(PigServer.java:215) at org.apache.pig.tools.grunt.Grunt.init(Grunt.java:55) at org.apache.pig.Main.run(Main.java:452) at org.apache.pig.Main.main(Main.java:107) Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 41, server = 43) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:207) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72) ... 9 more I guess the problem is the version mismatch between the hadoop-append-core jar files that my hadoop/hbase clusters is using currently and the hadoop-core jar files that pig is using.Anyone faced any similar kind of issue..??? On the documentation website... its written requirement as hadoop-0.20.2, but the problem is I want to use my hadoop and hbase along with pig also. Any suggestions.. how to resolve this issue..!! Can anyone please mention which version of each one of them, are compatible with each other to work fine to put them in production. Thanks, Praveenesh
Does hadoop-0.20-append compatible with PIG 0.8 ?
Hi guys.. I am previously using hadoop and Hbase... So for Hbase to run perfectly fine we need Hadoop-0.20-append for Hbase jar files.. So I am using Hadoop-0.20-append jar files.. which made both my hadoop and hbase to work fine.. Now I want to use pig for my hadoop and hbase clusters.. I downloaded pig 0.8.0... and configured pig to run in map-reduce mode by setting the pig_classpath to point to the $HADOOP_HOME/conf directory. Then running ‘pig’ gives the following error messeage. hadoop@ub13:/usr/local/pig/bin$ pig 2011-07-01 17:41:52,150 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/pig/bin/pig_1309522312144.log 2011-07-01 17:41:52,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ub13:54310 2011-07-01 17:41:52,654 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage LOG MESSAGE - Error before Pig is launched--- ERROR 2999: Unexpected internal error. Failed to create DataStorage java.lang.RuntimeException: Failed to create DataStorage at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:58) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134) at org.apache.pig.impl.PigContext.connect(PigContext.java:183) at org.apache.pig.PigServer.init(PigServer.java:226) at org.apache.pig.PigServer.init(PigServer.java:215) at org.apache.pig.tools.grunt.Grunt.init(Grunt.java:55) at org.apache.pig.Main.run(Main.java:452) at org.apache.pig.Main.main(Main.java:107) Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 41, server = 43) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:207) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72) ... 9 more I guess the problem is the version mismatch between the hadoop-append-core jar files that my hadoop/hbase clusters is using currently and the hadoop-core jar files that pig is using.Anyone faced any similar kind of issue..??? On the documentation website... its written requirement as hadoop-0.20.2, but the problem is I want to use my hadoop and hbase along with pig also. Any suggestions.. how to resolve this issue..!! Can anyone please mention which version of each one of them, are compatible with each other to work fine to put them in production. Thanks, Praveenesh
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
NameNode is starting with exceptions whenever its trying to start datanodes
Helloo.. My namenode is running with the following exceptions and going to safemode everytime its trying to start the datanodes.. why so ? I deleted all the files in the HDFS.. and ran it again..!! 2011-06-07 15:02:19,467 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ub13/162.192.100.53 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20-append-r1056497 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r 1056491; compiled by 'stack' on Fri Jan 7 20:43:30 UTC 2011 / 2011-06-07 15:02:19,637 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2011-06-07 15:02:19,645 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub13/ 162.192.100.53:54310 2011-06-07 15:02:19,651 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2011-06-07 15:02:19,653 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-06-07 15:02:19,991 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2011-06-07 15:02:19,992 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-06-07 15:02:19,992 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-06-07 15:02:20,034 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-06-07 15:02:20,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-07 15:02:20,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 56 2011-06-07 15:02:20,310 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-06-07 15:02:20,310 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 5718 loaded in 0 seconds. 2011-06-07 15:02:20,320 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 7 2011-06-07 15:02:20,321 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop/hadoop-datastore/hadoop-hadoop/dfs/name/current/edits of size 1049092 edits # 7 loaded in 0 seconds. 2011-06-07 15:02:20,337 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 5718 saved in 0 seconds. 2011-06-07 15:02:20,784 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 5718 saved in 0 seconds. 2011-06-07 15:02:21,227 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1482 msecs 2011-06-07 15:02:21,242 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mode will be turned off automatically. 2011-06-07 15:02:26,941 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-06-07 15:02:27,031 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070 2011-06-07 15:02:27,033 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070 2011-06-07 15:02:27,033 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2011-06-07 15:02:27,033 INFO org.mortbay.log: jetty-6.1.14 2011-06-07 15:02:27,537 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 2011-06-07 15:02:27,538 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: 0.0.0.0:50070 2011-06-07 15:02:27,549 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-06-07 15:02:27,559 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting 2011-06-07 15:02:27,565 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: starting 2011-06-07 15:02:27,573 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: starting 2011-06-07 15:02:27,585 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: starting 2011-06-07 15:02:27,597 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: starting 2011-06-07 15:02:27,613 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: starting 2011-06-07 15:02:27,621 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: starting 2011-06-07 15:02:27,632 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: starting 2011-06-07 15:02:27,633 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: starting 2011-06-07
Re: NameNode is starting with exceptions whenever its trying to start datanodes
But I dnt have any data on my HDFS.. I was having some date before.. but now I deleted all the files from HDFS.. I dnt know why datanodes are taking time to start.. I guess because of this exception its taking more time to start. On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote: On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data
Re: NameNode is starting with exceptions whenever its trying to start datanodes
1. Some of your data node is getting connected, that means password less SSH is not working within nodes. So you mean that passwordless SSH should be there among datanodes also. In hadoop we used to do password less SSH from namenode to data nodes Do we have to do passwordless ssh among datanodes also ??? On Tue, Jun 7, 2011 at 11:15 PM, jagaran das jagaran_...@yahoo.co.inwrote: Check two things: 1. Some of your data node is getting connected, that means password less SSH is not working within nodes. 2. Then Clear the Dir where you data is persisted in data nodes and format the namenode. It should definitely work then Cheers, Jagaran From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 3:14:01 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes But I dnt have any data on my HDFS.. I was having some date before.. but now I deleted all the files from HDFS.. I dnt know why datanodes are taking time to start.. I guess because of this exception its taking more time to start. On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote: On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data
Re: NameNode is starting with exceptions whenever its trying to start datanodes
Sorry I mean Some of your data nodes are not getting connected.. So are you sticking with your solution that you are saying to me.. to go for passwordless ssh for all datanodes.. because for my hadoop.. all datanodes are running fine On Tue, Jun 7, 2011 at 11:32 PM, jagaran das jagaran_...@yahoo.co.inwrote: Sorry I mean Some of your data nodes are not getting connected From: jagaran das jagaran_...@yahoo.co.in To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 10:45:59 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes Check two things: 1. Some of your data node is getting connected, that means password less SSH is not working within nodes. 2. Then Clear the Dir where you data is persisted in data nodes and format the namenode. It should definitely work then Cheers, Jagaran From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 3:14:01 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes But I dnt have any data on my HDFS.. I was having some date before.. but now I deleted all the files from HDFS.. I dnt know why datanodes are taking time to start.. I guess because of this exception its taking more time to start. On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote: On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data
Re: NameNode is starting with exceptions whenever its trying to start datanodes
Dude.. passwordless ssh between my namenode and datanode is working all fine...!!! My question is --- *Are you talking about passwordless ssh between datanodes * ** or *Are you talking about password less ssh between datanodes and namenode * ** Because if you are talking about 2nd case.. than that thing is working fine...because I already mentioned it that all my datanodes in hadoop are working fine..!!! I can see all those datanodes using hadoop fsck / as well as in hdfs web UI also.. On Tue, Jun 7, 2011 at 11:35 PM, jagaran das jagaran_...@yahoo.co.inwrote: Yes Correct Password less SSH between your name node and some of your datanode is not working From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 10:56:08 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes 1. Some of your data node is getting connected, that means password less SSH is not working within nodes. So you mean that passwordless SSH should be there among datanodes also. In hadoop we used to do password less SSH from namenode to data nodes Do we have to do passwordless ssh among datanodes also ??? On Tue, Jun 7, 2011 at 11:15 PM, jagaran das jagaran_...@yahoo.co.in wrote: Check two things: 1. Some of your data node is getting connected, that means password less SSH is not working within nodes. 2. Then Clear the Dir where you data is persisted in data nodes and format the namenode. It should definitely work then Cheers, Jagaran From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 3:14:01 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes But I dnt have any data on my HDFS.. I was having some date before.. but now I deleted all the files from HDFS.. I dnt know why datanodes are taking time to start.. I guess because of this exception its taking more time to start. On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote: On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data
Re: NameNode is starting with exceptions whenever its trying to start datanodes
how shall I clean my data dir ??? Cleaning data dir .. u mean to say is deleting all files from hdfs ???.. is there any special command to clean all the datanodes in one step ??? On Tue, Jun 7, 2011 at 11:46 PM, jagaran das jagaran_...@yahoo.co.inwrote: Cleaning data from data dir of datanode and formatting the name node may help you From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 11:05:03 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes Sorry I mean Some of your data nodes are not getting connected.. So are you sticking with your solution that you are saying to me.. to go for passwordless ssh for all datanodes.. because for my hadoop.. all datanodes are running fine On Tue, Jun 7, 2011 at 11:32 PM, jagaran das jagaran_...@yahoo.co.in wrote: Sorry I mean Some of your data nodes are not getting connected From: jagaran das jagaran_...@yahoo.co.in To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 10:45:59 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes Check two things: 1. Some of your data node is getting connected, that means password less SSH is not working within nodes. 2. Then Clear the Dir where you data is persisted in data nodes and format the namenode. It should definitely work then Cheers, Jagaran From: praveenesh kumar praveen...@gmail.com To: common-user@hadoop.apache.org Sent: Tue, 7 June, 2011 3:14:01 AM Subject: Re: NameNode is starting with exceptions whenever its trying to start datanodes But I dnt have any data on my HDFS.. I was having some date before.. but now I deleted all the files from HDFS.. I dnt know why datanodes are taking time to start.. I guess because of this exception its taking more time to start. On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote: On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data
Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar
Hello guys..!!! I am currently working on Hbase 0.90.3 and Hadoop 0.20.2 Since this hadoop version does not support rsync hdfs.. so I copied the *hadoop-core-append jar* file from *hbase/lib* folder into*hadoop folder * and replaced it with* hadoop-0.20.2-core.jar* which was suggested in the following link http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm I guess this is what have been mentioned in the link that I am doing. If I am doing somehting wrong, kindly tell me. But now after adding that jar file.. I am not able to run my hadoop.. I am getting following exception messages on my screen ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/datanode/DataNode ub13: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode Have I done something wrong.. Please guide me...!! Thanks, Praveenesh
Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar
Hi, Not able to see my email in the mail archive..So sending it again...!!! Guys.. need your feedback..!! Thanks, Praveenesh -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Mon, Jun 6, 2011 at 12:09 PM Subject: Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar To: common-user@hadoop.apache.org, u...@hbase.apache.org Hello guys..!!! I am currently working on Hbase 0.90.3 and Hadoop 0.20.2 Since this hadoop version does not support rsync hdfs.. so I copied the *hadoop-core-append jar* file from *hbase/lib* folder into*hadoop folder * and replaced it with* hadoop-0.20.2-core.jar* which was suggested in the following link http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm I guess this is what have been mentioned in the link that I am doing. If I am doing somehting wrong, kindly tell me. But now after adding that jar file.. I am not able to run my hadoop.. I am getting following exception messages on my screen ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/datanode/DataNode ub13: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode Have I done something wrong.. Please guide me...!! Thanks, Praveenesh
HBase Web UI showing exception everytime I am running it
Hello guys.. I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2 cluster I dnt know why its happening..onlye 1 time its running .. after that its not.. HBASE WEB URL is showing the following exception Why its happening... Please help..!! Thanks, Praveenesh HTTP ERROR 500 Problem accessing /master.jsp. Reason: Trying to contact region server ub1:60020 for region .META.,,1, row '', but failed after 10 attempts. Exceptions: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at
Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar
Hello guys.. Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? Thanks, Praveenesh On Mon, Jun 6, 2011 at 2:18 PM, praveenesh kumar praveen...@gmail.comwrote: Hi, Not able to see my email in the mail archive..So sending it again...!!! Guys.. need your feedback..!! Thanks, Praveenesh -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Mon, Jun 6, 2011 at 12:09 PM Subject: Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar To: common-user@hadoop.apache.org, u...@hbase.apache.org Hello guys..!!! I am currently working on Hbase 0.90.3 and Hadoop 0.20.2 Since this hadoop version does not support rsync hdfs.. so I copied the *hadoop-core-append jar* file from *hbase/lib* folder into* hadoop folder* and replaced it with* hadoop-0.20.2-core.jar* which was suggested in the following link http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm I guess this is what have been mentioned in the link that I am doing. If I am doing somehting wrong, kindly tell me. But now after adding that jar file.. I am not able to run my hadoop.. I am getting following exception messages on my screen ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/datanode/DataNode ub13: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName ub13: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName ub13: at java.net.URLClassLoader$1.run(URLClassLoader.java:217) ub13: at java.security.AccessController.doPrivileged(Native Method) ub13: at java.net.URLClassLoader.findClass(URLClassLoader.java:205) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ub13: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) ub13: at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ub13: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. ub13: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode Have I done something wrong.. Please guide me...!! Thanks, Praveenesh
Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar
It worked by renaming the hadoop-append*.jar file to hadoop-core.0.20.2.jar file..I dnt know why.. but it worked..!! Also.. After this thing.. my hbase started well for 1 time.. but after that.. its not working..fine.. there is some problem is starting of region servers.. I have send the exceptions in my other email.. I hope it will reach the mailing group after some time. Thanks, Praveenesh On Mon, Jun 6, 2011 at 8:59 PM, Stack st...@duboce.net wrote: On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com wrote: Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? It would seem to be. Did you have two hadoop*jar versions in your lib directory by any chance? You did not remove the first? St.Ack
Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??
Guys, I am in a very big big confusion. Please.. I really need your feedbacks and suggestions.. The scenario is like this... I set up *Hadoop 0.20.2 cluster* of *12 nodes*.. Now I set up* Hbase 0.90.3* *12 node cluster* on top of it. But after all that experimenting and struggling.. I read the following SHOCKING line on my Hbase web UI ---* You are currently running the HMaster without HDFS append support enabled. This may result in data loss. Please see the HBase wiki for details. * And when I searched more about it.. I found Michael G. Nolls article.. saying that *Hadoop 0.20.2 and Hbase 0.90.2 are not compatible*. Is *Hadoop 0.20.2 also not compatible with Hbase 0.90.3 ???* So does it means I have to re-install hadoop 0.20. append if I want to use Hbase. I did a lot of struggle to reach upto this stage.. do I have to do all of it again.. ?? Is there any other work around solution.. of not re-installing everything again ?? Please help..!!! :-( Thanks, Praveenesh
Fwd: Data node is taking time to start.. Error register getProtocolVersion in namenode..!!
Hey guys..!! Any suggestions..!!! -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Wed, Jun 1, 2011 at 2:48 PM Subject: Data node is taking time to start.. Error register getProtocolVersion in namenode..!! To: common-user@hadoop.apache.org Hello Hadoop users.!!! Well.. I am doing simple hadoop single node installation.. but my datanode is taking some time to run.. If I go through the namenode logs.. I am getting some strange exception. 2011-06-02 03:59:59,959 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ub4/162.192.100.44 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb $ / 2011-06-02 04:00:00,034 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2011-06-02 04:00:00,038 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub4/ 162.192.100.44:54310 2011-06-02 04:00:00,039 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=nu$ 2011-06-02 04:00:00,040 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using contex$ 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-06-02 04:00:00,084 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using$ 2011-06-02 04:00:00,085 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-02 04:00:00,109 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 96 loaded in 0 seconds. 2011-06-02 04:00:00,550 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 489 msecs 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs. 2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2011-06-02 04:00:01,093 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-06-02 04:00:01,137 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before ope$ 2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].get$ 2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2011-06-02 04:00:01,138 INFO org.mortbay.log: jetty-6.1.14 2011-06-02 04:00:48,495 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 2011-06-02 04:00:48,495 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: 0.0.0.0:50070 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: starting 2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server
Hbase Web UI Interface on hbase 0.90.3 ?
Hello guys. I just have installed hbase on my hadoop cluster. HMaster,HRegionServer,HQuorum Peer all are working fine.. as I can see these processes running through JPS. Is there any way to know which regionservers are running right and not ? I mean is there some kind of hbase web UI or anyway to know the status of Hbase cluster ?? Like for hadoop we can use hadoop fsck command and other web UIs .. When I used hbase 0.20.6, I can see hbase web UI hosting on 60010 port by default. But I can not see that Web UI when I am using Hbase 0.90.3 ? Also I can not see hbase-default.xml in my $HBASE_HOME/conf folder. Is this the reason for that ? So do I need to set all those configurations on my own for this new version ?? Thanks, Praveenesh
Data node is taking time to start.. Error register getProtocolVersion in namenode..!!
Hello Hadoop users.!!! Well.. I am doing simple hadoop single node installation.. but my datanode is taking some time to run.. If I go through the namenode logs.. I am getting some strange exception. 2011-06-02 03:59:59,959 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ub4/162.192.100.44 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb $ / 2011-06-02 04:00:00,034 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2011-06-02 04:00:00,038 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub4/ 162.192.100.44:54310 2011-06-02 04:00:00,039 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=nu$ 2011-06-02 04:00:00,040 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using contex$ 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-06-02 04:00:00,074 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-06-02 04:00:00,084 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using$ 2011-06-02 04:00:00,085 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-02 04:00:00,109 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 96 loaded in 0 seconds. 2011-06-02 04:00:00,550 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 489 msecs 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0 2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs. 2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2011-06-02 04:00:01,093 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-06-02 04:00:01,137 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before ope$ 2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].get$ 2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2011-06-02 04:00:01,138 INFO org.mortbay.log: jetty-6.1.14 2011-06-02 04:00:48,495 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 2011-06-02 04:00:48,495 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: 0.0.0.0:50070 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: starting 2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: starting 2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: starting 2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: starting 2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310: starting 2011-06-02 04:00:48,504 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting *2011-06-02 04:00:48,532 INFO
How to compile HBase code ?
Hello guys, In case any of you are working on HBASE, I just wrote a program by reading some tutorials.. But no where its mentioned how to run codes on HBASE. In case anyone of you has done some coding on HBASE , can you please tell me how to run it. I am able to compile my code by adding hbase-core.jar and hadoop-core.jar in classpath while compiling it. But not able to figure out how to run it. Whenever I am doing java ExampleClient ( which is my Hbase program), I am getting the following error : Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at ExampleClient.main(ExampleClient.java:20) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 1 more Thanks, Praveenesh
Re: How to compile HBase code ?
I am simply using HBase API, not doing any Map-reduce work on it. Following is the code I have written , simply creating the file on HBase: import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; public class ExampleClient { public static void main(String args []) throws IOException { HBaseConfiguration config = new HBaseConfiguration(); HBaseAdmin admin = new HBaseAdmin(config); HTableDescriptor htd = new HTableDescriptor(test); HColumnDescriptor hcd = new HColumnDescriptor(data); htd.addFamily(hcd); admin.createTable(htd); byte [] tablename = htd.getName(); HTableDescriptor [] tables = admin.listTables(); if(tables.length !=1 Bytes.equals(tablename, tables[0].getName())) { throw new IOException(Failed to create table); } HTable table = new HTable(config,tablename); byte[] row1 = Bytes.toBytes(row1); Put p1 = new Put(row1); byte[] databytes = Bytes.toBytes(data); p1.add(databytes,Bytes.toBytes(1),Bytes.toBytes(value1)); table.put(p1); Get g = new Get(row1); Result result = table.get(g); System.out.println(Get : + result); Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan); try { for(Result scannerResult: scanner) { System.out.println(Scan : + scannerResult); } }catch(Exception e ){ e.printStackTrace(); } finally{ scanner.close(); } table.close(); } } Now I have set the classpath variable in /etc/environment as MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2.2.jar now I am compiling my code with javac command *$javac -classpath $MYCLASSPATH ExampleClient.java* It is working fine. While running, I am using java command *$java -classpath $MYCLASSPATH ExampleClient*, then I am getting the following error : Exception in thread main java.lang.NoClassDefFoundError: ExampleClient Caused by: java.lang.ClassNotFoundException: ExampleClient at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: ExampleClient. Program will exit. But I am running the code from the same location. and ExampleClient.class file exists at that location. On Tue, May 24, 2011 at 3:07 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: How do you execute the client (command line) do you use the java or the hadoop command? It seems that there is an error in your classpath when running the client job. The classpath when compiling classes that implement the client is different from the classpath when your client is executed since hadoop and hbase carry their own environment. Maybe tha following link helps: http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath regards Christian ---8 Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 München, Deutschland Tel.: +49 (89) 636-42722 Fax: +49 (89) 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Brigitte Ederer, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322 -Ursprüngliche Nachricht- Von: praveenesh kumar [mailto:praveen...@gmail.com] Gesendet: Dienstag, 24. Mai 2011 11:08 An: common-user@hadoop.apache.org Betreff: How to compile HBase code ? Hello guys, In case any of you are working on HBASE, I just wrote a program by reading some tutorials.. But no where its mentioned how to run codes on HBASE. In case anyone of you has done some coding on HBASE , can you please tell me how to run it. I am able to compile my code by adding hbase-core.jar and hadoop-core.jar in classpath while compiling it. But not able to figure out how to run it. Whenever I am doing java ExampleClient ( which is my
Re: How to compile HBase code ?
Hey Harsh, Actually I mailed to HBase mailing list also.. but since I wanted to get this thing done as soon as possible so I mailed in this group also.. anyways I will take care of this in future , although I got more responses in this mailing list only :-) Anyways problem is solved.. What i did is added the folder containing my .class file in the classpath, along with commons-logging-1.0.4.jar and log4j-1.2.15.jar in my classpath: so now Myclasspath variable looks like : * MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2..2.jar::/usr/local/hadoop/hbase/hbase/lib/commons-logging-1.0.4.jar:/usr/local/hadoop/hbase/hbase/lib/log4j-1.2.15.jar:/usr/local/hadoop/hbase/ * * * and then I used* java -classpath $MYCLASSPATH ExampleClient.java* now its running.. Thanks.!!! Praveenesh On Tue, May 24, 2011 at 3:55 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, HBase has their own user mailing lists where such queries ought to go. Am moving the discussion to u...@hbase.apache.org and bcc-ing common-user@ here. Also added you to cc. Regarding your first error, going forward you can use the useful `hbase classpath` to generate a HBase-provided classpath list for you automatically. Something like: $ MYCLASSPATH=`hbase classpath` Regarding the second, latest one as below, your ExampleClient.class isn't on the MYCLASSPATH (nor is the directory it is under, i.e. '.') so Java can't really find it. This is not a HBase issue. HTH. On Tue, May 24, 2011 at 3:23 PM, praveenesh kumar praveen...@gmail.com wrote: I am simply using HBase API, not doing any Map-reduce work on it. Following is the code I have written , simply creating the file on HBase: import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; public class ExampleClient { public static void main(String args []) throws IOException { HBaseConfiguration config = new HBaseConfiguration(); HBaseAdmin admin = new HBaseAdmin(config); HTableDescriptor htd = new HTableDescriptor(test); HColumnDescriptor hcd = new HColumnDescriptor(data); htd.addFamily(hcd); admin.createTable(htd); byte [] tablename = htd.getName(); HTableDescriptor [] tables = admin.listTables(); if(tables.length !=1 Bytes.equals(tablename, tables[0].getName())) { throw new IOException(Failed to create table); } HTable table = new HTable(config,tablename); byte[] row1 = Bytes.toBytes(row1); Put p1 = new Put(row1); byte[] databytes = Bytes.toBytes(data); p1.add(databytes,Bytes.toBytes(1),Bytes.toBytes(value1)); table.put(p1); Get g = new Get(row1); Result result = table.get(g); System.out.println(Get : + result); Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan); try { for(Result scannerResult: scanner) { System.out.println(Scan : + scannerResult); } }catch(Exception e ){ e.printStackTrace(); } finally{ scanner.close(); } table.close(); } } Now I have set the classpath variable in /etc/environment as MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2.2.jar now I am compiling my code with javac command *$javac -classpath $MYCLASSPATH ExampleClient.java* It is working fine. While running, I am using java command *$java -classpath $MYCLASSPATH ExampleClient*, then I am getting the following error : Exception in thread main java.lang.NoClassDefFoundError: ExampleClient Caused by: java.lang.ClassNotFoundException: ExampleClient at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: ExampleClient. Program will exit. But I am running the code from the same location. and ExampleClient.class file exists at that location. On Tue, May 24, 2011 at 3:07 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: How do you execute the client (command line) do you use
Re: How to compile HBase code ?
Hey harsh, I tried that.. its not working. I am using hbase 0.20.6. there is no command like bin/hbase classpath : hadoop@ub6:/usr/local/hadoop/hbase$ hbase Usage: hbase command where command is one of: shellrun the HBase shell master run an HBase HMaster node regionserver run an HBase HRegionServer node rest run an HBase REST server thrift run an HBase Thrift server zookeeperrun a Zookeeper server migrate upgrade an hbase.rootdir or CLASSNAMErun the class named CLASSNAME Thanks, Praveenesh On Tue, May 24, 2011 at 4:59 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, On Tue, May 24, 2011 at 4:31 PM, praveenesh kumar praveen...@gmail.com wrote: Hey Harsh, Actually I mailed to HBase mailing list also.. but since I wanted to get this thing done as soon as possible so I mailed in this group also.. anyways I will take care of this in future , although I got more responses in this mailing list only :-) Anyways problem is solved.. Good to know your problem resolved. You can also use the `bin/hbase classpath` utility to generate HBase parts of the classpath automatically in the future, instead of adding classes manually - saves you time. -- Harsh J
Fwd: Hbase question,,!!
Please any suggestions..!! -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Sun, May 22, 2011 at 2:23 PM Subject: Hbase question,,!! To: common-user@hadoop.apache.org Okay guys.. so I have hadoop cluster of 5 nodes.. the configuration look like this. 162.192.100.53 -- Master as well as slave Slave nodes : 162.192.100.52 162.192.100.51 162.192.100.50 162.192.100.49 Now I want to implement HBASE on my hadoop cluster.. What can be the best configuration for my HBASE based on my hadoop structure ?? Thanks, Praveenesh
Re: Installing Hadoop
OR you can refer to following tutorial for a referal..!! http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ On Mon, May 23, 2011 at 11:06 PM, jgroups mohitanch...@gmail.com wrote: I am trying to install hadoop in cluster env with multiple nodes. Following instructions from http://hadoop.apache.org/common/docs/r0.17.0/cluster_setup.html http://hadoop.apache.org/common/docs/r0.17.0/cluster_setup.html That page refers to hadoop-site.xml. But I don't see that in /hadoop-0.20.203.0/conf. Are there more upto date installation instructions somwhere else? -- View this message in context: http://old.nabble.com/Installing-Hadoop-tp31683812p31683812.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Hbase question,,!!
Okay guys.. so I have hadoop cluster of 5 nodes.. the configuration look like this. 162.192.100.53 -- Master as well as slave Slave nodes : 162.192.100.52 162.192.100.51 162.192.100.50 162.192.100.49 Now I want to implement HBASE on my hadoop cluster.. What can be the best configuration for my HBASE based on my hadoop structure ?? Thanks, Praveenesh
Re: Why Only 1 Reducer is running ??
My program is a basic program like this : import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); Job.Job.setNumReduceTasks(10); JobClient.runJob(conf); } } How to use Job.setNumReduceTasks(INT) function here.. I am not using any Job class object here. Thanks. Praveenesh On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts evert.lamme...@sara.nlwrote: Hi Praveenesh, * You can set the maximum amount of reducers per node in your mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set to 2). * You can set the default number of reduce tasks with mapred.reduce.tasks (default set to 1 - this causes your single reducer). * Your job can try to override this setting by calling Job.setNumReduceTasks(INT) ( http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int) ). Cheers, Evert -Original Message- From: modemide [mailto:modem...@gmail.com] Sent: vrijdag 20 mei 2011 15:26 To: common-user@hadoop.apache.org Subject: Re: Why Only 1 Reducer is running ?? what does your mapred-site.xml file say? I've used wordcount and had close to 12 reduces running on a 6 datanode cluster on a 3 GB file. I have a configuration in there which says: mapred.reduce.tasks = 12 The reason I chose 12 was because it was recommended that I choose 2x number of tasktrackers. On 5/20/11, praveenesh kumar praveen...@gmail.com wrote: Hello everyone, I am using wordcount application to test on my hadoop cluster of 5 nodes. The file size is around 5 GB. Its taking around 2 min - 40 sec for execution. But when I am checking the JobTracker web portal, I am seeing only one reducer is running. Why so ?? How can I change the code so that I will run multiple reducers also ?? Thanks, Praveenesh
Re: Why Only 1 Reducer is running ??
Okie I figured it out.. it was simple.. conf.setsetNumReduceTasks(10); my mistake.. Anyhow when I am running 10 reducers for Wordcount problem.. I am seeing only slight increase in the speed of the program... Why so ?? So more reducers do not gauranteee faster execution ?? How can we decide to use how many reducers to make our program run in the best way possible ?? Thanks, Praveenesh On Mon, May 23, 2011 at 10:08 AM, praveenesh kumar praveen...@gmail.comwrote: My program is a basic program like this : import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); Job.setNumReduceTasks(10); JobClient.runJob(conf); } } How to use Job.setNumReduceTasks(INT) function here.. I am not using any Job class object here. Thanks. Praveenesh On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts evert.lamme...@sara.nlwrote: Hi Praveenesh, * You can set the maximum amount of reducers per node in your mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set to 2). * You can set the default number of reduce tasks with mapred.reduce.tasks (default set to 1 - this causes your single reducer). * Your job can try to override this setting by calling Job.setNumReduceTasks(INT) ( http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int) ). Cheers, Evert -Original Message- From: modemide [mailto:modem...@gmail.com] Sent: vrijdag 20 mei 2011 15:26 To: common-user@hadoop.apache.org Subject: Re: Why Only 1 Reducer is running ?? what does your mapred-site.xml file say? I've used wordcount and had close to 12 reduces running on a 6 datanode cluster on a 3 GB file. I have a configuration in there which says: mapred.reduce.tasks = 12 The reason I chose 12 was because it was recommended that I choose 2x number of tasktrackers. On 5/20/11, praveenesh kumar praveen...@gmail.com wrote: Hello everyone, I am using wordcount application to test on my hadoop cluster of 5 nodes. The file size is around 5 GB. Its taking around 2 min - 40 sec for execution. But when I am checking the JobTracker web portal, I am seeing only one reducer is running. Why so ?? How can I change the code so that I will run multiple reducers also ?? Thanks, Praveenesh
How to see block information on NameNode ?
hey..!! I have a question. If I copy some file on HDFS file system, it will get split into blocks and Namenode will keep all these meta info with it. How can I see that info. I copied 5 GB file on NameNode, but I see that file only on the NameNode.. It doesnot get split into blocks..?? How can I see whether my file is getting split into blocks and which data node is keeping which block ?? Thanks, Praveenesh
Why Only 1 Reducer is running ??
Hello everyone, I am using wordcount application to test on my hadoop cluster of 5 nodes. The file size is around 5 GB. Its taking around 2 min - 40 sec for execution. But when I am checking the JobTracker web portal, I am seeing only one reducer is running. Why so ?? How can I change the code so that I will run multiple reducers also ?? Thanks, Praveenesh
Re: Why Only 1 Reducer is running ??
I am using the wordcount example that comes along with hadoop. How can I configure it to make it use multiple reducers. I guess mutiple reducers will make it run more fast .. Does it ?? On Fri, May 20, 2011 at 6:51 PM, James Seigel Tynt ja...@tynt.com wrote: The job could be designed to use one reducer On 2011-05-20, at 7:19 AM, praveenesh kumar praveen...@gmail.com wrote: Hello everyone, I am using wordcount application to test on my hadoop cluster of 5 nodes. The file size is around 5 GB. Its taking around 2 min - 40 sec for execution. But when I am checking the JobTracker web portal, I am seeing only one reducer is running. Why so ?? How can I change the code so that I will run multiple reducers also ?? Thanks, Praveenesh
How hadoop parse input files into (Key,Value) pairs ??
Hi, As we know hadoop mapper takes input as (Key,Value) pairs and generate intermediate (Key,Value) pairs and usually we give input to our Mapper as a text file. How hadoop understand this and parse our input text file into (Key,Value) Pairs Usually our mapper looks like -- *public* *void* map(LongWritable key, Text value,OutputCollectorText, Text outputCollector, Reporter reporter) *throws* IOException { String word = value.toString(); //Some lines of code } So if I pass any text file as input, it is taking every line as VALUE to Mapper..on which I will do some processing and put it to OutputCollector. But how hadoop parsed my text file into ( Key,Value ) pair and how can we tell hadoop what (key,value) it should give to mapper ?? Thanks.
Can we access NameNode HDFS from slave Nodes ??
hey, Can we access NameNode's hdfs on our slave machines ?? I am just running command hadoop dfs -ls on my slave machine ( running tasktracker and Datanode), and its giving me the following output : hadoop@ub12:~$ hadoop dfs -ls 11/05/05 18:31:54 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 0 time(s). 11/05/05 18:31:55 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 1 time(s). 11/05/05 18:31:56 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 2 time(s). 11/05/05 18:31:57 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 3 time(s). 11/05/05 18:31:58 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 4 time(s). 11/05/05 18:31:59 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 5 time(s). 11/05/05 18:32:00 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 6 time(s). 11/05/05 18:32:01 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 7 time(s). 11/05/05 18:32:02 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 8 time(s). 11/05/05 18:32:03 INFO ipc.Client: Retrying connect to server: ub13/ 162.192.100.53:54310. Already tried 9 time(s). Bad connection to FS. command aborted. I just restarted my Master Node ( and run start-all.sh ) The output on my master node is hadoop@ub13:/usr/local/hadoop$ start-all.sh starting namenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-ub13.out ub11: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub11.out ub10: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub10.out ub12: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub12.out ub13: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub13.out ub13: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out starting jobtracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-ub13.out ub10: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub10.out ub11: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub11.out ub12: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub12.out ub13: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub13.out hadoop@ub13:/usr/local/hadoop$ jps 6471 NameNode 7070 Jps 6875 JobTracker 6632 DataNode 7030 TaskTracker 6795 SecondaryNameNode Thanks, Praveenesh
org.apache.hadoop.mapred.InvalidInputException ??
Hi, I am new to hadoop and the scenario is like this : I have hadoop installed on a linux machine having IP as (162.192.100.46) and I have another window machine with eclipse and hadoop plugin installed.. I am able to connect to linux hadoop machine and can see the dfs location and mapred folder using my plugin. I copied all the hadoop jar files from linux to windows and set them in my eclipse. I am trying to run a sample small code from windows to the linux hadoop machine Code : import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; public class TestDriver { public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(TestDriver.class); // TODO: specify output types conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files) //conf.setInputPath(new Path(src)); //conf.setOutputPath(new Path(out)); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(In)); FileOutputFormat.setOutputPath(conf, new Path(Out)); // TODO: specify a mapper conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); // TODO: specify a reducer conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } } } Whenever I am trying to run the code on eclipse plugin .. I am getting the following error : *11/04/25 13:39:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=* *11/04/25 13:39:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.* * org.apache.hadoop.mapred.InvalidInputException**: Input path does not exist: hdfs://162.192.100.46:54310/user/hadoop/In* *at org.apache.hadoop.mapred.FileInputFormat.listStatus(** FileInputFormat.java:190**)* *at org.apache.hadoop.mapred.FileInputFormat.getSplits(** FileInputFormat.java:201**)* *at org.apache.hadoop.mapred.JobClient.writeOldSplits(**JobClient.java:810** )* *at org.apache.hadoop.mapred.JobClient.submitJobInternal(** JobClient.java:781**)* *at org.apache.hadoop.mapred.JobClient.submitJob(**JobClient.java:730**)* *at org.apache.hadoop.mapred.JobClient.runJob(**JobClient.java:1249**)* *at TestDriver.main(**TestDriver.java:46**)* I know I am doing something wrong, Can anyone tell me where I am wrong, and how can I run my code from windows to that linux hadoop machine. Thanks, Praveenesh
Re: org.apache.hadoop.mapred.InvalidInputException ??
Hi, I am able to run hadoop map-reduce wordcount example on my linux machine.. Means my hadoop settings are correct on my linux machine.. I don't know about valid path you are talking about ?? where to set this thing ?? On Mon, Apr 25, 2011 at 11:58 AM, Harsh J ha...@cloudera.com wrote: Do you have a valid path /user/hadoop/In (it must be a file, or a directory with files)? On Mon, Apr 25, 2011 at 11:32 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, I am new to hadoop and the scenario is like this : I have hadoop installed on a linux machine having IP as (162.192.100.46) and I have another window machine with eclipse and hadoop plugin installed.. I am able to connect to linux hadoop machine and can see the dfs location and mapred folder using my plugin. I copied all the hadoop jar files from linux to windows and set them in my eclipse. I am trying to run a sample small code from windows to the linux hadoop machine Code : import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; public class TestDriver { public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(TestDriver.class); // TODO: specify output types conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files) //conf.setInputPath(new Path(src)); //conf.setOutputPath(new Path(out)); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(In)); FileOutputFormat.setOutputPath(conf, new Path(Out)); // TODO: specify a mapper conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); // TODO: specify a reducer conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } } } Whenever I am trying to run the code on eclipse plugin .. I am getting the following error : *11/04/25 13:39:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=* *11/04/25 13:39:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.* * org.apache.hadoop.mapred.InvalidInputException**: Input path does not exist: hdfs://162.192.100.46:54310/user/hadoop/In* *at org.apache.hadoop.mapred.FileInputFormat.listStatus(** FileInputFormat.java:190**)* *at org.apache.hadoop.mapred.FileInputFormat.getSplits(** FileInputFormat.java:201**)* *at org.apache.hadoop.mapred.JobClient.writeOldSplits(**JobClient.java:810** )* *at org.apache.hadoop.mapred.JobClient.submitJobInternal(** JobClient.java:781**)* *at org.apache.hadoop.mapred.JobClient.submitJob(**JobClient.java:730**)* *at org.apache.hadoop.mapred.JobClient.runJob(**JobClient.java:1249**)* *at TestDriver.main(**TestDriver.java:46**)* I know I am doing something wrong, Can anyone tell me where I am wrong, and how can I run my code from windows to that linux hadoop machine. Thanks, Praveenesh -- Harsh J
Error while compiling the program
Hi, I am running the following code (Gender.java) on my hadoop . import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class Gender { private static String genderCheck = female; public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text locText = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); String location = line.split(,)[14] + , + line.split(,)[15]; long male = 0L; long female = 0L; if (line.split(,)[17].matches(\d+) line.split(,)[18].matches(\d+)) { male = Long.parseLong(line.split(,)[17]); female = Long.parseLong(line.split(,)[18]); } long diff = male - female; locText.set(location); if (Gender.genderCheck.toLowerCase().equals(female) diff 0) { output.collect(locText, new LongWritable(diff * -1L)); } else if (Gender.genderCheck.toLowerCase().equals(male) diff 0) { output.collect(locText, new LongWritable(diff)); } } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(Gender.class); conf.setJobName(gender); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(LongWritable.class); conf.setMapperClass(Map.class); if (args.length != 3) { System.out.println(Usage:); System.out.println([male/female] /path/to/2kh/files /path/to/output); System.exit(1); } if (!args[0].equalsIgnoreCase(male) !args[0].equalsIgnoreCase(female)) { System.out.println(first argument must be male or female); System.exit(1); } Gender.genderCheck = args[0]; conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[1])); FileOutputFormat.setOutputPath(conf, new Path(args[2])); JobClient.runJob(conf); } } I am getting the following exception while compiling this : *Gender.java:14: Gender.Map is not abstract and does not override abstract method map(java.lang.Object,java.lang.Object,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter) in org.apache.hadoop.mapred.Mapper public static class Map extends MapReduceBase implements Mapper { ^ Note: Gender.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Gender.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. * Anyone suggest me how to debug this error ??
Re: Error while compiling the program
Thanks Joey..!! It compiled.. Regards, Praveenesh On Mon, Apr 25, 2011 at 3:47 PM, Joey Echeverria j...@cloudera.com wrote: Your delcaration of the Map class needs to include the input and output types, e.g.: public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, LongWritable { ... } -Joey On Mon, Apr 25, 2011 at 4:38 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, I am running the following code (Gender.java) on my hadoop . import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class Gender { private static String genderCheck = female; public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text locText = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); String location = line.split(,)[14] + , + line.split(,)[15]; long male = 0L; long female = 0L; if (line.split(,)[17].matches(\d+) line.split(,)[18].matches(\d+)) { male = Long.parseLong(line.split(,)[17]); female = Long.parseLong(line.split(,)[18]); } long diff = male - female; locText.set(location); if (Gender.genderCheck.toLowerCase().equals(female) diff 0) { output.collect(locText, new LongWritable(diff * -1L)); } else if (Gender.genderCheck.toLowerCase().equals(male) diff 0) { output.collect(locText, new LongWritable(diff)); } } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(Gender.class); conf.setJobName(gender); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(LongWritable.class); conf.setMapperClass(Map.class); if (args.length != 3) { System.out.println(Usage:); System.out.println([male/female] /path/to/2kh/files /path/to/output); System.exit(1); } if (!args[0].equalsIgnoreCase(male) !args[0].equalsIgnoreCase(female)) { System.out.println(first argument must be male or female); System.exit(1); } Gender.genderCheck = args[0]; conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[1])); FileOutputFormat.setOutputPath(conf, new Path(args[2])); JobClient.runJob(conf); } } I am getting the following exception while compiling this : *Gender.java:14: Gender.Map is not abstract and does not override abstract method map(java.lang.Object,java.lang.Object,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter) in org.apache.hadoop.mapred.Mapper public static class Map extends MapReduceBase implements Mapper { ^ Note: Gender.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Gender.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. * Anyone suggest me how to debug this error ?? -- Joseph Echeverria Cloudera, Inc. 443.305.9434
hadoop dfs -copyFromLocal ??
Hi, I am learning hadoop. Whenever we use hadoop dfs -copyFromLocal input-file name output-file name I assume the file is copied from linux file system to hadoop file system However the output of the command shows us that file is somewhere stored in /user/hadoop/* But if we search it from linux, we can not see those files.. why so ??? Can I go to the location of the files which get copied to DFS from linux ?? Suppose I have copied some file into DFS and now from other system, I want to give that file as an input.. how can I give that file as input to the program. I mean how can I remotely access to the files that are copied to DFS and pass them as input to my programs.?? Thanks, Praveenesh
Hadoop from Windows ??
The problem I am facing is 1 I have 1 Windows System. I am running eclipse with hadoop - plugin.. Its not a part of hadoop cluster. I am able to connect to hadoop systems and can view DFS and MAPRED folders using this plugin. If I am able to view the contents of the hadoop, so I am assuming that I can connect to the hadoop system from my windows. 2. Now I am writing some program from my windows machine and try to run it on hadoop machines. but whenever I am trying to do that.. I am getting the following error : *1/04/25 18:24:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=* * 11/04/25 18:24:21 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/04/25 18:24:21 INFO input.FileInputFormat: Total input paths to process : 1 Exception in thread main java.io.IOException: Cannot run program chmod: CreateProcess error=2, The system cannot find the file specified at java.lang.ProcessBuilder.start ProcessBuilder.java:460) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) at org.apache.hadoop.util.Shell.execCommand(Shell.java:354) at org.apache.hadoop.util.Shell.execCommand(Shell.java:337) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:481) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:473) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:280) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:372) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1216) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197) at org.apache.hadoop.mapred.LocalJobRunner$Job.init(LocalJobRunner.java:92) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at WordCount.run(WordCount.java:94) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at WordCount.main(WordCount.java:98) * *Caused by: * * java.io.IOException: CreateProcess error=2, The system cannot find the file specified at java.lang.ProcessImpl.create(Native Method) at java.lang.ProcessImpl.init(ProcessImpl.java:81) at java.lang.ProcessImpl.start(ProcessImpl.java:30) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 24 more* My program is : import java.io.*; import java.util.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount extends Configured implements Tool { public static class MapClass extends MapperObject, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } /** * A reducer class that just emits the sum of the input values. */ public static class Reduce extends ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IterableIntWritable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } static int printUsage() { System.out.println(wordcount [-r reduces] input output); ToolRunner.printGenericCommandUsage(System.out); return -1; } public int run(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, WordCount example for hadoop 0.20.1); job.setJarByClass(WordCount.class); job.setMapperClass(MapClass.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); // the keys are words (strings) job.setOutputKeyClass(Text.class); // the values are counts (ints) job.setOutputValueClass(IntWritable.class); ListString other_args = new ArrayListString(); for(int i=0; i args.length; ++i) { try { // The number of map tasks was earlier configurable, // But with hadoop 0.20.1, it is decided by the framework. // Since this heavily
HBASE on Hadoop
Hello everyone, Thanks everyone for guiding me everytime. I am able to setup hadoop cluster of 10 nodes. Now comes HBASE..!!! I am new to all this... My problem is I have huge data to analyze. so shall I go for single node Hbase installation on all nodes or go for distributed Hbase installation.?? How distributed installation is different from single node installaion ?? Now suppose if I have distributed Hbase... and If I design some table on my master node.. and then store data on it.. say around 100M. How the data is going to be distributed.. Will HBASE do it automatically or we have to write codes for getting it distributed ?? Is there any good tutorial that tells us more about HBase and how to work on it ??? Thanks, Praveenesh
Re: java.net.ConnectException
Hi, Have you checked the ports on which map-reduce server and hdfs are running. I guess the plugin gives by default its own ports. you have to replace it with the ports on which you are running your map reduce and hdfs. I guess that might help you..!! Thanks, Praveenesh On Mon, Apr 18, 2011 at 4:44 PM, RAGHAVENDRA PRASAD raghav.npra...@gmail.com wrote: I am a newbie into hadoop. We are trying to set up hadoop infrastructure at our company. I am not sure whether it is a right forum to ask this question. Our Application server is windows. I was looking for tutorial by which we can connect from windows system to hadoop(on ubuntu) and have to run MR jobs. I downloaded Eclipse plugin(Runs on Windows Server) and i gave ip address for Host(hadoop location).When i clicked on finish, i got an error - Failed on Connection Exception java.net.ConnectException. Please let me know on how to proceed or any tutorial would be helpful. Regards, Raghavendra Prasad
Hadoop Speed Efficiency ??
Hello everyone, I am new to hadoop... I set up a hadoop cluster of 4 ubuntu systems. ( Hadoop 0.20.2) and I am running the well known word count (gutenberg) example to test how fast my hadoop is working.. But whenever I am running wordcount example..I am not able to see any much processing time difference.. On single node the wordcount is taking the same time.. and on cluster of 4 systems also it is taking almost the same time.. Am I doing anything wrong here ?? Can anyone explain me why its happening.. and how can I make maximum use of my cluster ?? Thanks. Praveenesh
Error : Too many fetch-failures
Hello, I am new to hadoop. I am using hadoop 0.20.2 on ubuntu. I recently installed and configured hadoop using the available tutorials on internet. My hadoop is running properly. But Whenever I am trying to run a wordcount example, the wordcount program got stuck at the reduce part. After long time , I am getting the following error.. hadoop@50:/usr/local/hadoop/hadoop$ hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output 11/04/14 23:24:20 INFO input.FileInputFormat: Total input paths to process : 3 11/04/14 23:24:25 INFO mapred.JobClient: Running job: job_201104142306_0001 11/04/14 23:24:26 INFO mapred.JobClient: map 0% reduce 0% 11/04/14 23:24:45 INFO mapred.JobClient: map 66% reduce 0% 11/04/14 23:24:54 INFO mapred.JobClient: map 100% reduce 0% 11/04/14 23:32:50 INFO mapred.JobClient: Task Id : attempt_201104142306_0001_m_00_0, Status : FAILED Too many fetch-failures 11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid argument or cannot assign requested address 11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid argument or cannot assign requested address 11/04/14 23:32:54 INFO mapred.JobClient: map 66% reduce 0% 11/04/14 23:33:00 INFO mapred.JobClient: map 100% reduce 0% Can somebody help me to solve this issue. Its urgent.. I wasted my whole day in figuring out the problem. Thanks, Praveenesh
Re: Error : Too many fetch-failures
Hi, From where I can see the logs ? I have done single node cluster installaiton and I am running hadoop on single machine only. Both Map and Reduce are running on same machine. Thanks, Praveenesh On Thu, Apr 14, 2011 at 4:43 PM, Harsh J ha...@cloudera.com wrote: Hello Praveenesh, On Thu, Apr 14, 2011 at 3:42 PM, praveenesh kumar praveen...@gmail.com wrote: attempt_201104142306_0001_m_00_0, Status : FAILED Too many fetch-failures 11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid argument or cannot assign requested address 11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid argument or cannot assign requested address In most cases, this is a simple DNS/hostnames configuration issue. The machine on which your Reducer is trying to run on, could be unable to contact the HTTP service of the TaskTracker the mappers ran on (one or many) due to problems on either end. If this is a single pseudo-distributed setup, you may want to verify the contents of your /etc/hosts file. A full log output pasted somewhere would also be helpful in determining the exact cause. -- Harsh J