Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
When installing hadoop on slave machines, do we have to install hadoop
at same locations on each machine ?
Can we have hadoop installation at different location on different
machines at same cluster ?
If yes, what things we have to take care in that case

Thanks,
Praveenesh


Re: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
What I mean to say is, Does hadoop internally assumes that all
installations on each nodes need to be in same location.
I was having hadoop installed on different location on 2 different nodes.
I configured  hadoop config files to be a part of same cluster.
But when I started hadoop on master, I saw it was also searching for
hadoop starting scripts in the same location as of master.
Do we have any workaround in these kind of situation or do I have to
reinstall hadoop again on same location as master.

Thanks,
Praveenesh

On Fri, Dec 23, 2011 at 6:26 PM, Michael Segel
michael_se...@hotmail.com wrote:
 Sure,
 You could do that, but in doing so, you will make your life a living hell.
 Literally.

 Think about it... You will have to manually manage each nodes config files...

 So if something goes wrong you will have a hard time diagnosing the issue.

 Why make life harder?

 Why not just do the simple think and make all of your DN the same?

 Sent from my iPhone

 On Dec 23, 2011, at 6:51 AM, praveenesh kumar praveen...@gmail.com wrote:

 When installing hadoop on slave machines, do we have to install hadoop
 at same locations on each machine ?
 Can we have hadoop installation at different location on different
 machines at same cluster ?
 If yes, what things we have to take care in that case

 Thanks,
 Praveenesh


How Jobtracker choose DataNodes to run TaskTracker ?

2011-12-15 Thread praveenesh kumar
Okay so I have one question in mind.

Suppose I have a replication factor of 3 on my cluster of some N
nodes, where N3 and  there is a data block B1 that exists on some 3
Data nodes -- DD1, DD2, DD3.

I want to run some Mapper function on this block.. My JT will
communicate with NN, to know where can he find the block.
My assumption is NN will give JT all the Data node information where
the block resides, in this case - DD1, DD2,DD3. Am I right on this ?

Now my question is how JT will come to know on which DD it should send
its mapper code ?

Suppose it chose DD1, and my tasktracker starts running on that
machine. By some reasons, DD1 is taking more time than it should have
taken time when it would be running on DD2. How hadoop understand and
take these decisions ?

Thanks,
Praveenesh


More cores Vs More Nodes ?

2011-12-12 Thread praveenesh kumar
Hey Guys,

So I have a very naive question in my mind regarding Hadoop cluster nodes ?

more cores or more nodes – Shall I spend money on going from 2-4 core
machines, or spend money on buying more nodes less core eg. say 2
machines of 2 cores for example?

Thanks,
Praveenesh


Hive on hadoop 0.20.205

2011-12-09 Thread praveenesh kumar
Did anyone tried HIVE on Hadoop 0.20.205.

I am trying to build HIVE from svn. but I am seeing its downloading
hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz.

If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build is
failing.

Any ideas or suggestion on what I may be doing wrong ?

Thanks,
Praveenesh


Re: Hive on hadoop 0.20.205

2011-12-09 Thread praveenesh kumar
/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java:52:
error: HivePreparedStatement is not abstract and does not override abstract
method isCloseOnCompletion() in Statement
[javac] public class HivePreparedStatement implements PreparedStatement
{
[javac]^
[javac]
/usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java:47:
error: HiveQueryResultSet is not abstract and does not override abstract
method TgetObject(String,ClassT) in ResultSet
[javac] public class HiveQueryResultSet extends HiveBaseResultSet {
[javac]^
[javac]   where T is a type-variable:
[javac] T extends Object declared in method
TgetObject(String,ClassT)
[javac]
/usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java:33:
error: HiveStatement is not abstract and does not override abstract method
isCloseOnCompletion() in Statement
[javac] public class HiveStatement implements java.sql.Statement {
[javac]^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note:
/usr/local/hadoop/hive/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java
uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 14 errors

BUILD FAILED
/usr/local/hadoop/hive/release-0.7.1/build.xml:196: The following error
occurred while executing this line:
/usr/local/hadoop/hive/release-0.7.1/build.xml:130: The following error
occurred while executing this line:
/usr/local/hadoop/hive/release-0.7.1/jdbc/build.xml:51: Compile failed; see
the compiler error output for details.

Total time: 29 minutes 46 seconds

Thanks,
Praveenesh

On Fri, Dec 9, 2011 at 2:08 PM, praveenesh kumar praveen...@gmail.comwrote:

 Did anyone tried HIVE on Hadoop 0.20.205.

 I am trying to build HIVE from svn. but I am seeing its downloading
 hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz.

 If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build
 is failing.

 Any ideas or suggestion on what I may be doing wrong ?

 Thanks,
 Praveenesh



Re: HDFS Backup nodes

2011-12-07 Thread praveenesh kumar
This means still we are relying on Secondary NameNode idealogy for
Namenode's backup.
Can OS-mirroring of Namenode is a good alternative keep it alive all the
time ?

Thanks,
Praveenesh

On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh



Warning: $HADOOP_HOME is deprecated

2011-12-07 Thread praveenesh kumar
How to avoid Warning: $HADOOP_HOME is deprecated messages on hadoop
0.20.205 ?

I tried adding *export HADOOP_HOME_WARN_SUPPRESS=  *in hadoop-env.sh on
Namenode.

But its still coming. Am I doing the right thing ?

Thanks,
Praveenesh


Re: Warning: $HADOOP_HOME is deprecated

2011-12-07 Thread praveenesh kumar
Okay, I fixed it.

I have to add  *export HADOOP_HOME_WARN_SUPPRESS=TRUE* in hadoop-env.sh
on all my hadoop nodes.

Thanks,
Praveenesh

On Wed, Dec 7, 2011 at 4:11 PM, alo alt wget.n...@googlemail.com wrote:

 Hi,

 looks like a bug in .205:
 https://issues.apache.org/jira/browse/HADOOP-7816

 - Alex

 On Wed, Dec 7, 2011 at 11:37 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  How to avoid Warning: $HADOOP_HOME is deprecated messages on hadoop
  0.20.205 ?
 
  I tried adding *export HADOOP_HOME_WARN_SUPPRESS=  *in hadoop-env.sh on
  Namenode.
 
  But its still coming. Am I doing the right thing ?
 
  Thanks,
  Praveenesh
 



 --
 Alexander Lorenz
 http://mapredit.blogspot.com

 *P **Think of the environment: please don't print this email unless you
 really need to.*



HDFS Backup nodes

2011-12-06 Thread praveenesh kumar
Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

Thanks,
Praveenesh


Utilizing multiple hard disks for hadoop HDFS ?

2011-12-01 Thread praveenesh kumar
Hi everyone,

So I have this blade server with 4x500 GB hard disks.
I want to use all these hard disks for hadoop HDFS.
How can I achieve this target ?

If I install hadoop on 1 hard disk and use other hard disk as normal
partitions eg.  -

/dev/sda1, -- HDD 1 -- Primary partition -- Linux + Hadoop installed on it
/dev/sda2, -- HDD 2 -- Mounted partition -- /mnt/dev/sda2
/dev/sda3, -- HDD3  -- Mounted partition -- /mnt/dev/sda3
/dev/sda4, -- HDD4  -- Mounted partition -- /mnt/dev/sda4

And if I create a hadoop.tmp.dir on each partition say --
/tmp/hadoop-datastore/hadoop-hadoop

and on core-site.xml, if I configure like --
property
namehadoop.tmp.dir/name

value/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda2/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda3/tmp/hadoop-datastore/hadoop-hadoop,/mnt/dev/sda4/tmp/hadoop-datastore/hadoop-hadoop/value
descriptionA base for other temporary directories./description
/property

Will it work ??

Can I set the above property for dfs.data.dir also ?

Thanks,
Praveenesh


Hadoop 0.20.205

2011-10-16 Thread praveenesh kumar
Hi all,

Any Idea, when is hadoop 0.20.205 is officially going to release ?
Is Hadoop-0.20.205 rc2 stable enough to start into production ?
I am using hadoop-0.20-append now with hbase 0.90.3, want to  switch to 205.
But looking for some valubale suggestions/recommendations ?

Thanks,
Praveenesh


Re: Too much fetch failure

2011-10-16 Thread praveenesh kumar
try commenting 127.0.0.1 localhost line in your /etc/hosts and then restart
the cluster and then try again.

Thanks,
Praveenesh

On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.comwrote:

 we are using hadoop on virtual box. when it is a single node then it works
 fine for big dataset larger than the default block size. but in case of
 multinode cluster (2 nodes) we are facing some problems.
 Like when the input dataset is smaller than the default block size(64 MB)
 then it works fine. but when the input dataset is larger than the default
 block size then it shows ‘too much fetch failure’ in reduce state.
 here is the output link
 http://paste.ubuntu.com/707517/

 From the above comments , there are many users who faced this problem.
 different users suggested to modify the /etc/hosts file in different manner
 to fix the problem. but there is no ultimate solution.we need the actual
 solution thats why we are writing here.

 this is our /etc/hosts file
 192.168.60.147 humayun # Added by NetworkManager
 127.0.0.1 localhost.localdomain localhost
 ::1 humayun localhost6.localdomain6 localhost6
 127.0.1.1 humayun

 # The following lines are desirable for IPv6 capable hosts
 ::1 localhost ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters
 ff02::3 ip6-allhosts

 192.168.60.1 master
 192.168.60.2 slave



Re: Too much fetch failure

2011-10-16 Thread praveenesh kumar
why are you formatting the namenode again ?
1. Just stop the cluster..
2. Just comment out the 127.0.0.1 localhost line
3. Restart the cluster.

How have you defined your hadoop config files..?
Have  you mentioned localhost there ?

Thanks,
Praveenesh

On Sun, Oct 16, 2011 at 7:42 PM, Humayun gmail humayun0...@gmail.comwrote:

 commenting the line 127.0.0.1 in /etc/hosts is not working. if i format the
 namenode then automatically this line is added.
 any other solution?

 On 16 October 2011 19:13, praveenesh kumar praveen...@gmail.com wrote:

  try commenting 127.0.0.1 localhost line in your /etc/hosts and then
 restart
  the cluster and then try again.
 
  Thanks,
  Praveenesh
 
  On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.com
  wrote:
 
   we are using hadoop on virtual box. when it is a single node then it
  works
   fine for big dataset larger than the default block size. but in case of
   multinode cluster (2 nodes) we are facing some problems.
   Like when the input dataset is smaller than the default block size(64
 MB)
   then it works fine. but when the input dataset is larger than the
 default
   block size then it shows ‘too much fetch failure’ in reduce state.
   here is the output link
   http://paste.ubuntu.com/707517/
  
   From the above comments , there are many users who faced this problem.
   different users suggested to modify the /etc/hosts file in different
  manner
   to fix the problem. but there is no ultimate solution.we need the
 actual
   solution thats why we are writing here.
  
   this is our /etc/hosts file
   192.168.60.147 humayun # Added by NetworkManager
   127.0.0.1 localhost.localdomain localhost
   ::1 humayun localhost6.localdomain6 localhost6
   127.0.1.1 humayun
  
   # The following lines are desirable for IPv6 capable hosts
   ::1 localhost ip6-localhost ip6-loopback
   fe00::0 ip6-localnet
   ff00::0 ip6-mcastprefix
   ff02::1 ip6-allnodes
   ff02::2 ip6-allrouters
   ff02::3 ip6-allhosts
  
   192.168.60.1 master
   192.168.60.2 slave
  
 



Re: Error using hadoop distcp

2011-10-05 Thread praveenesh kumar
I tried that thing also.. when I am using IP address, its saying I should
use hostname.

*hadoop@ub13:~$ hadoop distcp
hdfs://162.192.100.53:54310/user/hadoop/webloghdfs://
162.192.100.16:54310/user/hadoop/weblog*
11/10/05 14:53:50 INFO tools.DistCp: srcPaths=[hdfs://
162.192.100.53:54310/user/hadoop/weblog]
11/10/05 14:53:50 INFO tools.DistCp: destPath=hdfs://
162.192.100.16:54310/user/hadoop/weblog
java.lang.IllegalArgumentException: Wrong FS: hdfs://
162.192.100.53:54310/user/hadoop/weblog, expected: hdfs://ub13:54310
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:464)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:621)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:638)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:884)

I have the entries of both machines in /etc/hosts...


On Wed, Oct 5, 2011 at 1:55 PM, bejoy.had...@gmail.com wrote:

 Hi praveenesh
 Can you try repeating the distcp using IP instead of host name.
 From the error looks like an RPC exception not able to identify the host, so
 I believe it can't be due to not setting a password less ssh. Just try it
 out.
 Regards
 Bejoy K S

 -Original Message-
 From: trang van anh anh...@vtc.vn
 Date: Wed, 05 Oct 2011 14:06:11
 To: common-user@hadoop.apache.org
 Reply-To: common-user@hadoop.apache.org
 Subject: Re: Error using hadoop distcp

 which  host run the task that throws the exception ? ensure that each
 data node know another data nodes in hadoop cluster- add ub16 entry
 in /etc/hosts on where the task running.
 On 10/5/2011 12:15 PM, praveenesh kumar wrote:
  I am trying to use distcp to copy a file from one HDFS to another.
 
  But while copying I am getting the following exception :
 
  hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
  hdfs://ub16:54310/user/hadoop/weblog
 
  11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
  attempt_201110031447_0005_m_07_0, Status : FAILED
  java.net.UnknownHostException: unknown host: ub16
   at
 org.apache.hadoop.ipc.Client$Connection.init(Client.java:195)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
   at org.apache.hadoop.ipc.Client.call(Client.java:720)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy1.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
   at
  org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:215)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:177)
   at
 
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
   at
  org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
   at
 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
   at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
   at
 
 org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
   at
 
 org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
   at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 
  Its saying its not finding ub16. But the entry is there in /etc/hosts
 files.
  I am able to ssh both the machines. Do I need password less ssh between
  these two NNs ?
  What can be the issue ? Any thing I am missing before using distcp ?
 
  Thanks,
  Praveenesh
 




Error using hadoop distcp

2011-10-04 Thread praveenesh kumar
I am trying to use distcp to copy a file from one HDFS to another.

But while copying I am getting the following exception :

hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
hdfs://ub16:54310/user/hadoop/weblog

11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
attempt_201110031447_0005_m_07_0, Status : FAILED
java.net.UnknownHostException: unknown host: ub16
at org.apache.hadoop.ipc.Client$Connection.init(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:215)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:177)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at
org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
at
org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

Its saying its not finding ub16. But the entry is there in /etc/hosts files.
I am able to ssh both the machines. Do I need password less ssh between
these two NNs ?
What can be the issue ? Any thing I am missing before using distcp ?

Thanks,
Praveenesh


Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread praveenesh kumar
Hi,

I want to know can we use SAN storage for Hadoop cluster setup ?
If yes, what should be the best pratices ?

Is it a good way to do considering the fact the underlining power of Hadoop
is co-locating the processing power (CPU) with the data storage and thus it
must be local storage to be effective.
*But also, is it better to say “local is better” in the situation where I
have a single local 5400 RPM IDE drive, which  would be dramatically slower
than SAN storage striped  across many drives spinning at 10k RPM and
accessed via fiber channel ?*
*
*
Thanks,
Praveenesh


hadoop question using VMWARE

2011-09-28 Thread praveenesh kumar
Hi,

Suppose I am having 10 windows machines and if I have 10 VM individual
instances running on these machines independently, can I use these VM
instances to communicate with each other so that I can make hadoop cluster
using those VM instances.

Did anyone tried that thing ?

I know we can setup multiple VM instances on same machine, but can we do it
across different machines also ?
And if I do like this, Is it a good approach, considering I don't have
dedicated ubuntu machines for hadoop ?

Thanks,
Praveenesh


Re: hadoop question using VMWARE

2011-09-28 Thread praveenesh kumar
 it's not something you can do for production nor performance
analysis.
Can you please tell me what does it mean ?
Why Can't we use this approach for production ???

Thanks

On Tue, Sep 27, 2011 at 11:56 PM, N Keywal nkey...@gmail.com wrote:

 Hi,

 Yes, it will work. HBase won't see the difference, it's a pure vmware
 stuff.
 Obviously, it's not something you can do for production nor performance
 analysis.

 Cheers,

 N.

 On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hi,
 
  Suppose I am having 10 windows machines and if I have 10 VM individual
  instances running on these machines independently, can I use these VM
  instances to communicate with each other so that I can make hadoop
 cluster
  using those VM instances.
 
  Did anyone tried that thing ?
 
  I know we can setup multiple VM instances on same machine, but can we do
 it
  across different machines also ?
  And if I do like this, Is it a good approach, considering I don't have
  dedicated ubuntu machines for hadoop ?
 
  Thanks,
  Praveenesh
 



How to run java code using Mahout from commandline ?

2011-09-23 Thread praveenesh kumar
Hey,
I have this code written using mahout. I am able to run the code from
eclipse
How can I run the code written in mahout from command line ?

My question is do I have to make a jar file and run it as hadoop jar
jarfilename.jar class
or shall I run it using simple java command ?

Can anyone solve my confusion ?
I am not able to run this code.

Thanks,
Praveenesh


Re: Can we replace namenode machine with some other machine ?

2011-09-22 Thread praveenesh kumar
But apart from storing metadata info, Is there anything more NN/JT machines
are doing ?? .
So I can say I can survive with poor NN if I am not dealing with lots of
files in HDFS ?

On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 
mahesw...@huawei.com wrote:

 By just changing the configs will not effect your data. You need to restart
 your DNs to connect to new NN.

 For the second question:
  It will again depends on your usage. If your files will more in DFS then
 NN will consume more memory as it needs to store all the metadata info of
 the files in NameSpace.

  If your files are more and more then it is recommended that dont put the
 NN and JT in same machine.

 Coming to DN case: Configured space will used for storing the block
 files.Once it is filled the space then NN will not select this DN for
 further writes. So, if one DN has less space should fine than less space for
 NN in big clusters.

 Configuring good configuration DN which has very good amount of space. And
 NN has less space to store your files metadata info then its of no use to
 have more space in DNs right :-)


 Regards,
 Uma
 - Original Message -
 From: praveenesh kumar praveen...@gmail.com
 Date: Thursday, September 22, 2011 10:42 am
 Subject: Re: Can we replace namenode machine with some other machine ?
 To: common-user@hadoop.apache.org

  If I just change configuration settings in slave machines, Will it
  effectany of the data that is currently residing in the cluster ??
 
  And my second question was...
  Do we need the master node (NN/JT hosting machine) to have good
  configuration than our slave machines(DN/TT hosting machines).
 
  Actually my master node is a weaker machine than my slave
  machines,because I
  am assuming that master machines does not do much additional work,
  and its
  okay to have a weak machine as master.
  Now I have a new big server machine just being added to my
  cluster. So I am
  thinking shall I make this new machine as my new master(NN/JT) or
  just add
  this machine as slave ?
 
  Thanks,
  Praveenesh
 
 
  On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 
  mahesw...@huawei.com wrote:
 
   You copy the same installations to new machine and change ip
  address. After that configure the new NN addresses to your
  clients and DNs.
  
   Also Does Namenode/JobTracker machine's configuration needs to
  be better
   than datanodes/tasktracker's ??
I did not get this question.
  
   Regards,
   Uma
  
   - Original Message -
   From: praveenesh kumar praveen...@gmail.com
   Date: Thursday, September 22, 2011 10:13 am
   Subject: Can we replace namenode machine with some other machine ?
   To: common-user@hadoop.apache.org
  
Hi all,
   
Can we replace our namenode machine later with some other
  machine. ?
Actually I got a new  server machine in my cluster and now I want
to make
this machine as my new namenode and jobtracker node ?
Also Does Namenode/JobTracker machine's configuration needs to be
betterthan datanodes/tasktracker's ??
   
How can I achieve this target with least overhead ?
   
Thanks,
Praveenesh
   
  
 



Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
Guys,

As far as I know hadoop, I think, to copy the files to HDFS, first it needs
to be copied to the NameNode's local filesystem. Is it right ??
So does it mean that even if I have a hadoop cluster of 10 nodes with
overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB,
I can not copy any file to HDFS greater than 500 GB ?

Is there any other way to directly copy to HDFS without copy the file to
namenode's local filesystem ?
What can be other ways to copy large files greater than namenode's disk
capacity ?

Thanks,
Praveenesh.


Re: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
So I want to copy the file from windows machine to linux namenode.
How can I define NAMENODE_URI in the code you mention, if I want to
copy data from windows machine to namenode machine ?

Thanks,
Praveenesh

On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 
mahesw...@huawei.com wrote:

 For more understanding the flows, i would recommend you to go through once
 below docs

 http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace

 Regards,
 Uma

 - Original Message -
 From: Uma Maheswara Rao G 72686 mahesw...@huawei.com
 Date: Wednesday, September 21, 2011 2:36 pm
 Subject: Re: Any other way to copy to HDFS ?
 To: common-user@hadoop.apache.org

 
  Hi,
 
  You need not copy the files to NameNode.
 
  Hadoop provide Client code as well to copy the files.
  To copy the files from other node ( non dfs), you need to put the
  hadoop**.jar's into classpath and use the below code snippet.
 
  FileSystem fs =new DistributedFileSystem();
  fs.initialize(NAMENODE_URI, configuration);
 
  fs.copyFromLocal(srcPath, dstPath);
 
  using this API, you can copy the files from any machine.
 
  Regards,
  Uma
 
 
 
 
 
  - Original Message -
  From: praveenesh kumar praveen...@gmail.com
  Date: Wednesday, September 21, 2011 2:14 pm
  Subject: Any other way to copy to HDFS ?
  To: common-user@hadoop.apache.org
 
   Guys,
  
   As far as I know hadoop, I think, to copy the files to HDFS,
  first
   it needs
   to be copied to the NameNode's local filesystem. Is it right ??
   So does it mean that even if I have a hadoop cluster of 10 nodes
  with overall capacity of 6TB, but if my NameNode's hard disk
  capacity
   is 500 GB,
   I can not copy any file to HDFS greater than 500 GB ?
  
   Is there any other way to directly copy to HDFS without copy the
   file to
   namenode's local filesystem ?
   What can be other ways to copy large files greater than
  namenode's
   diskcapacity ?
  
   Thanks,
   Praveenesh.
  
 



Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
 is running on hdfs://10.18.52.63:9000

Then you can caonnect to your NameNode like below.

FileSystem fs =new DistributedFileSystem();
fs.initialize(new URI(hdfs://10.18.52.63:9000/), new Configuration());

Please go through the below mentioned docs, you will more understanding.

if I want to
 copy data from windows machine to namenode machine ?
 In DFS namenode will be responsible for only nameSpace.

 in simple words to understand quickly the flow:
  Clients will ask NameNode to give some DNs to copy the data. Then NN will
create file entry in NameSpace and also will return the block entries based
on client request. Then clients directly will connect to the DNs and copy
the data.
Reading data back also will the sameway.

I hope you will understand better now :-)


Regards,
Uma

- Original Message -
From: praveenesh kumar praveen...@gmail.com
 Date: Wednesday, September 21, 2011 3:11 pm
Subject: Re: Any other way to copy to HDFS ?
To: common-user@hadoop.apache.org

 So I want to copy the file from windows machine to linux namenode.
 How can I define NAMENODE_URI in the code you mention, if I want to
 copy data from windows machine to namenode machine ?

 Thanks,
 Praveenesh

 On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 
 mahesw...@huawei.com wrote:

  For more understanding the flows, i would recommend you to go
 through once
  below docs
 
 

http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace

  Regards,
  Uma
 
  - Original Message -
  From: Uma Maheswara Rao G 72686 mahesw...@huawei.com
  Date: Wednesday, September 21, 2011 2:36 pm
  Subject: Re: Any other way to copy to HDFS ?
  To: common-user@hadoop.apache.org
 
  
   Hi,
  
   You need not copy the files to NameNode.
  
   Hadoop provide Client code as well to copy the files.
   To copy the files from other node ( non dfs), you need to put the
   hadoop**.jar's into classpath and use the below code snippet.
  
   FileSystem fs =new DistributedFileSystem();
   fs.initialize(NAMENODE_URI, configuration);
  
   fs.copyFromLocal(srcPath, dstPath);
  
   using this API, you can copy the files from any machine.
  
   Regards,
   Uma
  
  
  
  
  
   - Original Message -
   From: praveenesh kumar praveen...@gmail.com
   Date: Wednesday, September 21, 2011 2:14 pm
   Subject: Any other way to copy to HDFS ?
   To: common-user@hadoop.apache.org
  
Guys,
   
As far as I know hadoop, I think, to copy the files to HDFS,
   first
it needs
to be copied to the NameNode's local filesystem. Is it right ??
So does it mean that even if I have a hadoop cluster of 10 nodes
   with overall capacity of 6TB, but if my NameNode's hard disk
   capacity
is 500 GB,
I can not copy any file to HDFS greater than 500 GB ?
   
Is there any other way to directly copy to HDFS without copy the
file to
namenode's local filesystem ?
What can be other ways to copy large files greater than
   namenode's
diskcapacity ?
   
Thanks,
Praveenesh.
   
  
 



Re: Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
Thanks a lot..!!
I guess I can play around with the permissions of dfs for a while.

On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 
mahesw...@huawei.com wrote:

 Hello Praveenesh,

 If you really need not care about permissions then you can disable it at NN
 side by using the property dfs.permissions.enable

 You can the permission for the path before creating as well.

 from docs:
 Changes to the File System API
 All methods that use a path parameter will throw AccessControlException if
 permission checking fails.

 New methods:

 public FSDataOutputStream create(Path f, FsPermission permission, boolean
 overwrite, int bufferSize, short replication, long blockSize, Progressable
 progress) throws IOException;
 public boolean mkdirs(Path f, FsPermission permission) throws IOException;
 public void setPermission(Path p, FsPermission permission) throws
 IOException;
 public void setOwner(Path p, String username, String groupname) throws
 IOException;
 public FileStatus getFileStatus(Path f) throws IOException; will
 additionally return the user, group and mode associated with the path.


 http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html


 Regards,
 Uma
 - Original Message -
 From: praveenesh kumar praveen...@gmail.com
 Date: Wednesday, September 21, 2011 3:41 pm
 Subject: Fwd: Any other way to copy to HDFS ?
 To: common-user@hadoop.apache.org

  Thanks a lot. I am trying to run the following code on my windows
  machinethat is not part of cluster.
   **
  *public* *static* *void* main(String args[]) *throws* IOException,
  URISyntaxException
 
  {
 
  FileSystem fs =*new* DistributedFileSystem();
 
  fs.initialize(*new* URI(hdfs://162.192.100.53:54310/),
  *new*Configuration());
  fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path(
  /user/hadoop/Positive.txt));
 
  System.*out*.println(Done);
 
  }
 
  But I am getting the following exception :
 
  Exception in thread main
  org.apache.hadoop.security.AccessControlException:
  org.apache.hadoop.security.AccessControlException: Permission denied:
  user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method) at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
  at
 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
  at
 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836)
  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500)
  at
 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
  at
  org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189)
 at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165)
  at
  org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137)
 at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20)
  Caused by: org.apache.hadoop.ipc.RemoteException:
  org.apache.hadoop.security.AccessControlException: Permission denied:
  user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x
  at
 
 org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
  at
 
 org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
  at
 
 org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
  at
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702)
  at
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
  at
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048)
  at
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381)
  at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:416

Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Is there any way that we can run a particular job in a hadoop on subset of
datanodes ?

My problem is I don't want to use all the nodes to run some job,
I am trying to make Job completion Vs No. of nodes graph for a particular
job.
One way to do is I can remove datanodes, and then see how much time the job
is taking.

Just for curiosity sake, want to know is there any other way possible to do
this, without removing datanodes.
I am afraid, if I remove datanodes, I can loose some data blocks that reside
on those machines as I have some files with replication = 1 ?

Thanks,
Praveenesh


Re: Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Oh wow.. I didn't know that..
Actually for me datanodes/tasktrackers are running on same machines.
I mention datanodes because if I delete those machines from masters list,
chances are the data will also loose.
So I don't want to do that..
but now I guess by stoping tasktrackers individually... I can decrease the
strength of my cluster by decreasing the number of nodes that will run
tasktracker .. right ?? This  way I won't loose my data also.. Right ??



On Wed, Sep 21, 2011 at 6:39 PM, Harsh J ha...@cloudera.com wrote:

 Praveenesh,

 TaskTrackers run your jobs' tasks for you, not DataNodes directly. So
 you can statically control loads on nodes by removing away
 TaskTrackers from your cluster.

 i.e, if you service hadoop-0.20-tasktracker stop or
 hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't
 run there anymore.

 Is this what you're looking for?

 (There are ways to achieve the exclusion dynamically, by writing a
 scheduler, but hard to tell without knowing what you need
 specifically, and why do you require it?)

 On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  Is there any way that we can run a particular job in a hadoop on subset
 of
  datanodes ?
 
  My problem is I don't want to use all the nodes to run some job,
  I am trying to make Job completion Vs No. of nodes graph for a particular
  job.
  One way to do is I can remove datanodes, and then see how much time the
 job
  is taking.
 
  Just for curiosity sake, want to know is there any other way possible to
 do
  this, without removing datanodes.
  I am afraid, if I remove datanodes, I can loose some data blocks that
 reside
  on those machines as I have some files with replication = 1 ?
 
  Thanks,
  Praveenesh
 



 --
 Harsh J



Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread praveenesh kumar
If I just change configuration settings in slave machines, Will it effect
any of the data that is currently residing in the cluster ??

And my second question was...
Do we need the master node (NN/JT hosting machine) to have good
configuration than our slave machines(DN/TT hosting machines).

Actually my master node is a weaker machine than my slave machines,because I
am assuming that master machines does not do much additional work, and its
okay to have a weak machine as master.
Now I have a new big server machine just being added to my cluster. So I am
thinking shall I make this new machine as my new master(NN/JT) or just add
this machine as slave ?

Thanks,
Praveenesh


On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 
mahesw...@huawei.com wrote:

 You copy the same installations to new machine and change ip address.
 After that configure the new NN addresses to your clients and DNs.

 Also Does Namenode/JobTracker machine's configuration needs to be better
 than datanodes/tasktracker's ??
  I did not get this question.

 Regards,
 Uma

 - Original Message -
 From: praveenesh kumar praveen...@gmail.com
 Date: Thursday, September 22, 2011 10:13 am
 Subject: Can we replace namenode machine with some other machine ?
 To: common-user@hadoop.apache.org

  Hi all,
 
  Can we replace our namenode machine later with some other machine. ?
  Actually I got a new  server machine in my cluster and now I want
  to make
  this machine as my new namenode and jobtracker node ?
  Also Does Namenode/JobTracker machine's configuration needs to be
  betterthan datanodes/tasktracker's ??
 
  How can I achieve this target with least overhead ?
 
  Thanks,
  Praveenesh
 



Re: Multiple Mappers and One Reducer

2011-09-07 Thread praveenesh kumar
Harsh, Can you please tell how can we use MultipleInputs using Job Object on
hadoop 0.20.2. As you can see, in MultipleInputs, its using JobConf object.
I want to use Job object as mentioned in new hadoop 0.21 API.
I remember you talked about pulling out things from new API and add it into
out project.
Can you please add more light how can we do this ?

Thanks ,
Praveenesh.

On Wed, Sep 7, 2011 at 2:57 AM, Harsh J ha...@cloudera.com wrote:

 Sahana,

 Yes this is possible as well. Please take a look at the MultipleInputs
 API @
 http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html

 It will allow you to add a path each with its own mapper
 implementation, and you can then have a common reducer since the key
 is what you'll be matching against.

 On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat sana.b...@gmail.com wrote:
  Hi,
  I understand that given a file, the file is split across 'n'
 mapper
  instances, which is the normal case.
  The scenario i have is :
  1. Two files which are not totally identical in terms of number of
 columns
  (but have data that is similar in a few columns) need to be processed and
  after computation a single output file has to be generated.
  Note : CV - computedvalue
  File1 belonging to one dataset has data for :
  Date,counter1,counter2, CV1,CV2
  File2 belonging to another dataset has data for :
  Date,counter1,counter2,CV3,CV4,CV5
  Computation to be carried out on these two files is :
  CV6 =(CV1*CV5)/100
  And the final emitted output file should have data in the sequence:
  Date,counter1,counter2,CV6
  The idea is to have two mappers (not instances) run on each of the file,
 and
  a single reducer that emits the final result file.
  Thanks,
  Sahana
  On Wed, Sep 7, 2011 at 2:40 PM, Harsh J ha...@cloudera.com wrote:
 
  Sahana,
 
  Yes. But, isn't that how it is normally? What makes you question this
  capability?
 
  On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat sana.b...@gmail.com
 wrote:
   Hi,
Is it possible to have multiple mappers  where each mapper is
   operating on a different input file and whose result (which is a key
   value
   pair from different mappers) is processed by a single reducer?
   Regards,
   Sahana
 
 
 
  --
  Harsh J
 
 



 --
 Harsh J



Re: MultipleInputs in hadoop 0.20.2

2011-08-26 Thread praveenesh kumar
FWIW the trunk/future-branches have new API MultipleInputs you can
pull and include in your project

   Can anyone please tell me how I can do the above thing. How can I use
MultipleInputs of higher hadoop version to use it in lower hadoop version.

Thanks

On Wed, Aug 24, 2011 at 5:50 PM, Harsh J ha...@cloudera.com wrote:

 0.20.x supports the older API and it has been 're-deemed' as the
 stable one. You shouldn't face any hesitation in using it as even 0.23
 would carry it (although there its properly deprecated). This is quite
 some confusion but I guess you still won't have some of the old API
 features in the new one.

 FWIW the trunk/future-branches have new API MultipleInputs you can
 pull and include in your project. Also, alternative distributions that
 do stable backports may carry MultipleInputs in the new API (I use
 CDH3 and it does have mapreduce.lib.input.MultipleInputs backported in
 it).

 On Wed, Aug 24, 2011 at 2:40 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hello guys,
 
  I am looking to use MultipleInputs.addInputPath() method in hadoop
 0.20.2.
  But when I am looking to its signature in the API, its like this :
  *
   public static void addInputPath(JobConf conf,
 Path path,
 Class? extends
  InputFormat inputFormatClass)*
 
*  public static void addInputPath(JobConf conf,
Path path,
Class? extends
  InputFormat inputFormatClass,
Class? extends
  Mapper mapperClass)*
 
  But as far as I know in hadoop 0.20.2, JobConf object is deprecated.
  How can I use MultipleInputs.addInputPath() in hadoop. Is there any other
  way or any new class introduced instead of this one.
 
  Thanks,
  Praveenesh
 



 --
 Harsh J



MultipleInputs in hadoop 0.20.2

2011-08-24 Thread praveenesh kumar
Hello guys,

I am looking to use MultipleInputs.addInputPath() method in hadoop 0.20.2.
But when I am looking to its signature in the API, its like this :
*
  public static void addInputPath(JobConf conf,
Path path,
Class? extends
InputFormat inputFormatClass)*

   *  public static void addInputPath(JobConf conf,
   Path path,
   Class? extends
InputFormat inputFormatClass,
   Class? extends
Mapper mapperClass)*

But as far as I know in hadoop 0.20.2, JobConf object is deprecated.
How can I use MultipleInputs.addInputPath() in hadoop. Is there any other
way or any new class introduced instead of this one.

Thanks,
Praveenesh


YCSB Benchmarking for HBase

2011-08-03 Thread praveenesh kumar
Hi,

Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ??

I am trying to run it, its giving me error:

$ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db
com.yahoo.ycsb.db.HBaseClient

YCSB Command Line client
Type help for command line help
Start with -help for usage info
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
at java.lang.Class.getConstructor0(Class.java:2716)
at java.lang.Class.newInstance0(Class.java:343)
at java.lang.Class.newInstance(Class.java:325)
at com.yahoo.ycsb.CommandLine.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more

By the error, it seems like its not able to get Hadoop-core.jar file, but
its already in the class path.
Has anyone worked on YCSB with hbase ?

Thanks,
Praveenesh


Giving filename as key to mapper ?

2011-07-15 Thread praveenesh kumar
Hi,
How can I give filename as key to mapper ?
I want to know the occurence of word in set of docs, so I want to keep key
as filename. Is it possible to give input key as filename in map function ?
Thanks,
Praveenesh


Re: Giving filename as key to mapper ?

2011-07-15 Thread praveenesh kumar
I am new to this hadoop API. Can anyone give me some tutorial or code snipet
on how to write your own input format to do these kind of things.
Thanks.

On Fri, Jul 15, 2011 at 8:07 PM, Robert Evans ev...@yahoo-inc.com wrote:

 To add to that if you really want the file name to be the key instead of
 just calling a different API in your map to get it you will probably need to
 write your own input format to do it.  It should be fairly simple and you
 can base it off of an existing input format to do it.

 --Bobby

 On 7/15/11 7:40 AM, Harsh J ha...@cloudera.com wrote:

 You can retrieve the filename in the new API as described here:


 http://search-hadoop.com/m/ZOmmJ1PZJqt1/map+input+filenamesubj=Retrieving+Filename

 In the old API, its available in the configuration instance of the
 mapper as key map.input.file. See the table below this section

 http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse
 for more such goodies.

 On Fri, Jul 15, 2011 at 5:44 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hi,
  How can I give filename as key to mapper ?
  I want to know the occurence of word in set of docs, so I want to keep
 key
  as filename. Is it possible to give input key as filename in map function
 ?
  Thanks,
  Praveenesh
 



 --
 Harsh J




Re: How hadoop parse input files into (Key,Value) pairs ??

2011-07-12 Thread praveenesh kumar
Hi,
So I have a file in which the records are comma separated (
Record1,Record2). I want to make first record (Record1 as key) and Record2
as value.
I am using hadoop 0.20-append version.
I am looking forward to use KeyValueTextInputFormat and then setting
key.value.separator.in.input.line with ,.Is it possible with
hadoop-0.20-append ? I am not able to do that.
Any help ?

Thanks,
Praveenesh

On Mon, May 23, 2011 at 3:45 AM, Mark question markq2...@gmail.com wrote:

 The case your talking about is when you use FileInputFormat ... So usually
 the InputFormat Interface is the one responsible for that.

 For FileInputFormat, it uses a LineRecordReader which will take your text
 file and assigns key to be the offset within your text file and value to be
 the line (until '\n') is seen.

 If you want to use other InputFormats check its API and pick what is
 suitable for you. In my case, I'm hocked with SequenceFileInputFormat where
 my input files are key,value records written by a regular java program
 (or
 parser). Then my Hadoop job will look at the keys and values that I wrote.

 I hope this helps a little,
 Mark

 On Thu, May 5, 2011 at 4:31 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hi,
 
  As we know hadoop mapper takes input as (Key,Value) pairs and generate
  intermediate (Key,Value) pairs and usually we give input to our Mapper as
 a
  text file.
  How hadoop understand this and parse our input text file into (Key,Value)
  Pairs
 
  Usually our mapper looks like  --
  *public* *void* map(LongWritable key, Text value,OutputCollectorText,
  Text
  outputCollector, Reporter reporter) *throws* IOException {
 
  String word = value.toString();
 
  //Some lines of code
 
  }
 
  So if I pass any text file as input, it is taking every line as VALUE to
  Mapper..on which I will do some processing and put it to OutputCollector.
  But how hadoop parsed my text file into ( Key,Value ) pair and how can we
  tell hadoop what (key,value) it should give to mapper ??
 
  Thanks.
 



Re: Does hadoop-0.20-append compatible with PIG 0.8 ?

2011-07-03 Thread praveenesh kumar
Hi,

There is no hadoop jar in my pig lib directory.  I tried copying my hadoop
jar files in the pig lib folder. Also I tried adding that jar file in the
pig lib path.. still the error is same.
Is there any other way to make it run with hadoop-0.20-append version.
Guys, I am stuck with this issue. Need your guidance.

Thanks,
Praveenesh

On Sat, Jul 2, 2011 at 1:36 PM, Joey Echeverria j...@cloudera.com wrote:

 Try replacing the hadoop jar from the pig lib directory with the one from
 your cluster.

 -Joey



 On Jul 2, 2011, at 0:38, praveenesh kumar praveen...@gmail.com wrote:

  Hi guys..
 
 
 
  I am previously using hadoop and Hbase...
 
 
 
  So for Hbase to run perfectly fine we need Hadoop-0.20-append for Hbase
 jar
  files.. So I am using Hadoop-0.20-append jar files.. which made both my
  hadoop and hbase to work fine..
 
  Now I want to use pig for my hadoop and hbase clusters..
 
  I downloaded pig 0.8.0... and configured pig to run in map-reduce mode by
  setting the pig_classpath to point to the $HADOOP_HOME/conf directory.
 Then
  running ‘pig’ gives the following error messeage.
 
 
 
  hadoop@ub13:/usr/local/pig/bin$ pig
 
  2011-07-01 17:41:52,150 [main] INFO  org.apache.pig.Main - Logging error
  messages to: /usr/local/pig/bin/pig_1309522312144.log
 
  2011-07-01 17:41:52,454 [main] INFO
  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
 Connecting
  to hadoop file system at: hdfs://ub13:54310
 
  2011-07-01 17:41:52,654 [main] ERROR org.apache.pig.Main - ERROR 2999:
  Unexpected internal error. Failed to create DataStorage
 
  LOG MESSAGE -
 
  Error before Pig is launched---
 
  ERROR 2999: Unexpected internal error. Failed to create DataStorage
 
  java.lang.RuntimeException: Failed to create DataStorage
 
  at
 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
 
  at
 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:58)
 
  at
 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
 
  at
 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
 
  at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
 
  at org.apache.pig.PigServer.init(PigServer.java:226)
 
  at org.apache.pig.PigServer.init(PigServer.java:215)
 
  at org.apache.pig.tools.grunt.Grunt.init(Grunt.java:55)
 
  at org.apache.pig.Main.run(Main.java:452)
 
  at org.apache.pig.Main.main(Main.java:107)
 
  Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
  org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client
 =
  41, server = 43)
 
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)
 
  at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
 
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:207)
 
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:170)
 
  at
 
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
 
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
 
  at
 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
 
  ... 9 more
 
 
 
  I guess the problem is the version mismatch between the
 hadoop-append-core
  jar files that my hadoop/hbase clusters is using currently and the
  hadoop-core jar files that pig is using.Anyone faced any similar kind of
  issue..???
  On the documentation website... its written requirement as hadoop-0.20.2,
  but the problem is I want to use my hadoop and hbase along with pig also.
 
  Any suggestions.. how to resolve this issue..!!
  Can anyone please mention which version of each one of them, are
 compatible
  with each other to work fine to put them in production.
 
  Thanks,
  Praveenesh



Does hadoop-0.20-append compatible with PIG 0.8 ?

2011-07-02 Thread praveenesh kumar
Hi guys..



I am previously using hadoop and Hbase...



So for Hbase to run perfectly fine we need Hadoop-0.20-append for Hbase jar
files.. So I am using Hadoop-0.20-append jar files.. which made both my
hadoop and hbase to work fine..

Now I want to use pig for my hadoop and hbase clusters..

I downloaded pig 0.8.0... and configured pig to run in map-reduce mode by
setting the pig_classpath to point to the $HADOOP_HOME/conf directory. Then
running ‘pig’ gives the following error messeage.



hadoop@ub13:/usr/local/pig/bin$ pig

2011-07-01 17:41:52,150 [main] INFO  org.apache.pig.Main - Logging error
messages to: /usr/local/pig/bin/pig_1309522312144.log

2011-07-01 17:41:52,454 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: hdfs://ub13:54310

2011-07-01 17:41:52,654 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage

LOG MESSAGE -

Error before Pig is launched---

ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:58)

at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)

at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)

at org.apache.pig.impl.PigContext.connect(PigContext.java:183)

at org.apache.pig.PigServer.init(PigServer.java:226)

at org.apache.pig.PigServer.init(PigServer.java:215)

at org.apache.pig.tools.grunt.Grunt.init(Grunt.java:55)

at org.apache.pig.Main.run(Main.java:452)

at org.apache.pig.Main.main(Main.java:107)

Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client =
41, server = 43)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)

at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:207)

at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:170)

at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)

... 9 more


I guess the problem is the version mismatch between the hadoop-append-core
jar files that my hadoop/hbase clusters is using currently and the
hadoop-core jar files that pig is using.Anyone faced any similar kind of
issue..???
On the documentation website... its written requirement as hadoop-0.20.2,
but the problem is I want to use my hadoop and hbase along with pig also.

Any suggestions.. how to resolve this issue..!!
Can anyone please mention which version of each one of them, are compatible
with each other to work fine to put them in production.

Thanks,
Praveenesh


Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread praveenesh kumar
I followed michael noll's tutorial for making hadoop-0-20-append jars..

http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/

After following the article.. we get 5 jar files which we need to replace it
from hadoop.0.20.2 jar file.
There is no jar file for hadoop-eclipse plugin..that I can see in my
repository if I follow that tutorial.

Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
regarding whether it is compatible with hadoop-0.20-append.

Does anyone else. faced this kind of issue ???

Thanks,
Praveenesh


On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote:

 Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the
 hadoop cluster. For this it needs to have same version of hadoop-core.jar
 for client as well as server(hadoop cluster).

 Update the hadoop eclipse plugin for your eclipse which is provided with
 hadoop-0.20-append release, it will work fine.


 Devaraj K

 -Original Message-
 From: praveenesh kumar [mailto:praveen...@gmail.com]
 Sent: Wednesday, June 22, 2011 11:25 AM
 To: common-user@hadoop.apache.org
 Subject: Hadoop eclipse plugin stopped working after replacing
 hadoop-0.20.2
 jar files with hadoop-0.20-append jar files

 Guys,
 I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
 It was working fine for me.
 I was using Eclipse SDK Helios 3.6.2 with the plugin
 hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
 MAPREDUCE-1280

 Now for Hbase installation.. I had to use hadoop-0.20-append compiled
 jars..and I had to replace the old jar files with new 0.20-append compiled
 jar files..
 But now after replacing .. my hadoop eclipse plugin is not working well for
 me.
 Whenever I am trying to connect to my hadoop master node from that and try
 to see DFS locations..
 it is giving me the following error:
 *
 Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version
 mismatch (client 41 server 43)*

 However the hadoop cluster is working fine if I go directly on hadoop
 namenode use hadoop commands..
 I can add files to HDFS.. run jobs from there.. HDFS web console and
 Map-Reduce web console are also working fine. but not able to use my
 previous hadoop eclipse plugin.

 Any suggestions or help for this issue ?

 Thanks,
 Praveenesh




Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-21 Thread praveenesh kumar
Guys,
I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
It was working fine for me.
I was using Eclipse SDK Helios 3.6.2 with the plugin
hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
MAPREDUCE-1280

Now for Hbase installation.. I had to use hadoop-0.20-append compiled
jars..and I had to replace the old jar files with new 0.20-append compiled
jar files..
But now after replacing .. my hadoop eclipse plugin is not working well for
me.
Whenever I am trying to connect to my hadoop master node from that and try
to see DFS locations..
it is giving me the following error:
*
Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version
mismatch (client 41 server 43)*

However the hadoop cluster is working fine if I go directly on hadoop
namenode use hadoop commands..
I can add files to HDFS.. run jobs from there.. HDFS web console and
Map-Reduce web console are also working fine. but not able to use my
previous hadoop eclipse plugin.

Any suggestions or help for this issue ?

Thanks,
Praveenesh


NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
Helloo..

My namenode is running with the following exceptions and going to safemode
everytime its trying to start the datanodes.. why so ?
I deleted all the files in the HDFS.. and ran it again..!!

2011-06-07 15:02:19,467 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ub13/162.192.100.53
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20-append-r1056497
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
1056491; compiled by 'stack' on Fri Jan  7 20:43:30 UTC 2011
/
2011-06-07 15:02:19,637 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2011-06-07 15:02:19,645 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub13/
162.192.100.53:54310
2011-06-07 15:02:19,651 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-06-07 15:02:19,653 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-06-07 15:02:19,991 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-06-07 15:02:19,992 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-06-07 15:02:19,992 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-06-07 15:02:20,034 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-06-07 15:02:20,036 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-07 15:02:20,276 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 56
2011-06-07 15:02:20,310 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2011-06-07 15:02:20,310 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 5718 loaded in 0 seconds.
2011-06-07 15:02:20,320 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached
end of edit log Number of transactions found 7
2011-06-07 15:02:20,321 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file
/usr/local/hadoop/hadoop-datastore/hadoop-hadoop/dfs/name/current/edits of
size 1049092 edits # 7 loaded in 0 seconds.
2011-06-07 15:02:20,337 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 5718 saved in 0 seconds.
2011-06-07 15:02:20,784 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 5718 saved in 0 seconds.
2011-06-07 15:02:21,227 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 1482 msecs
2011-06-07 15:02:21,242 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
mode ON.
The ratio of reported blocks 0. has not reached the threshold 0.9990.
Safe mode will be turned off automatically.
2011-06-07 15:02:26,941 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2011-06-07 15:02:27,031 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 50070
2011-06-07 15:02:27,033 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070
webServer.getConnectors()[0].getLocalPort() returned 50070
2011-06-07 15:02:27,033 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50070
2011-06-07 15:02:27,033 INFO org.mortbay.log: jetty-6.1.14
2011-06-07 15:02:27,537 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50070
2011-06-07 15:02:27,538 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2011-06-07 15:02:27,549 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2011-06-07 15:02:27,559 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 54310: starting
2011-06-07 15:02:27,565 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 54310: starting
2011-06-07 15:02:27,573 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54310: starting
2011-06-07 15:02:27,585 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310: starting
2011-06-07 15:02:27,597 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 54310: starting
2011-06-07 15:02:27,613 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 54310: starting
2011-06-07 15:02:27,621 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54310: starting
2011-06-07 15:02:27,632 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54310: starting
2011-06-07 15:02:27,633 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 54310: starting
2011-06-07 

Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
But I dnt have any data on my HDFS.. I was having some date before.. but now
I deleted all the files from HDFS..
I dnt know why datanodes are taking time to start.. I guess because of this
exception its taking more time to start.

On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote:

 On 06/07/2011 10:50 AM, praveenesh kumar wrote:

 The logs say


  The ratio of reported blocks 0.9091 has not reached the threshold 0.9990.
 Safe mode will be turned off automatically.



 not enough datanodes reported in, or they are missing data



Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
1. Some of your data node is getting connected, that means password less
SSH is
not working within nodes.

So you mean that passwordless SSH should be there among datanodes also.
In hadoop we used to do password less SSH from namenode to data nodes
Do we have to do passwordless ssh among datanodes also ???

On Tue, Jun 7, 2011 at 11:15 PM, jagaran das jagaran_...@yahoo.co.inwrote:

 Check two things:

 1. Some of your data node is getting connected, that means password less
 SSH is
 not working within nodes.
 2. Then Clear the Dir where you data is persisted in data nodes and format
 the
 namenode.

 It should definitely work then

 Cheers,
 Jagaran



 
 From: praveenesh kumar praveen...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Tue, 7 June, 2011 3:14:01 AM
 Subject: Re: NameNode is starting with exceptions whenever its trying to
 start
 datanodes

 But I dnt have any data on my HDFS.. I was having some date before.. but
 now
 I deleted all the files from HDFS..
 I dnt know why datanodes are taking time to start.. I guess because of this
 exception its taking more time to start.

 On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote:

  On 06/07/2011 10:50 AM, praveenesh kumar wrote:
 
  The logs say
 
 
   The ratio of reported blocks 0.9091 has not reached the threshold
 0.9990.
  Safe mode will be turned off automatically.
 
 
 
  not enough datanodes reported in, or they are missing data
 



Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
Sorry I mean Some of your data nodes are not  getting connected..

So are you sticking with your solution that you are saying to me.. to go for
passwordless ssh for all datanodes..
because for my hadoop.. all datanodes are running fine



On Tue, Jun 7, 2011 at 11:32 PM, jagaran das jagaran_...@yahoo.co.inwrote:

 Sorry I mean Some of your data nodes are not  getting connected



 
 From: jagaran das jagaran_...@yahoo.co.in
 To: common-user@hadoop.apache.org
 Sent: Tue, 7 June, 2011 10:45:59 AM
  Subject: Re: NameNode is starting with exceptions whenever its trying to
 start
 datanodes

 Check two things:

 1. Some of your data node is getting connected, that means password less
 SSH is
 not working within nodes.
 2. Then Clear the Dir where you data is persisted in data nodes and format
 the
 namenode.

 It should definitely work then

 Cheers,
 Jagaran



 
 From: praveenesh kumar praveen...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Tue, 7 June, 2011 3:14:01 AM
 Subject: Re: NameNode is starting with exceptions whenever its trying to
 start
 datanodes

 But I dnt have any data on my HDFS.. I was having some date before.. but
 now
 I deleted all the files from HDFS..
 I dnt know why datanodes are taking time to start.. I guess because of this
 exception its taking more time to start.

 On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org wrote:

  On 06/07/2011 10:50 AM, praveenesh kumar wrote:
 
  The logs say
 
 
   The ratio of reported blocks 0.9091 has not reached the threshold
 0.9990.
  Safe mode will be turned off automatically.
 
 
 
  not enough datanodes reported in, or they are missing data
 



Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
Dude.. passwordless ssh between my namenode and datanode is working all
fine...!!!

My question is ---

*Are you talking about passwordless ssh between datanodes *
**
 or

*Are you talking about password less ssh between datanodes and namenode *
**
Because if you are talking about 2nd case.. than that thing is working
fine...because I already mentioned it that all my datanodes in hadoop are
working fine..!!! I can see all those datanodes using hadoop fsck / as
well as in hdfs web UI also..



On Tue, Jun 7, 2011 at 11:35 PM, jagaran das jagaran_...@yahoo.co.inwrote:

 Yes Correct
 Password less SSH between your name node and some of your datanode is not
 working




 
 From: praveenesh kumar praveen...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Tue, 7 June, 2011 10:56:08 AM
  Subject: Re: NameNode is starting with exceptions whenever its trying to
 start
 datanodes

 1. Some of your data node is getting connected, that means password less
 SSH is
 not working within nodes.

 So you mean that passwordless SSH should be there among datanodes also.
 In hadoop we used to do password less SSH from namenode to data nodes
 Do we have to do passwordless ssh among datanodes also ???

 On Tue, Jun 7, 2011 at 11:15 PM, jagaran das jagaran_...@yahoo.co.in
 wrote:

  Check two things:
 
  1. Some of your data node is getting connected, that means password less
  SSH is
  not working within nodes.
  2. Then Clear the Dir where you data is persisted in data nodes and
 format
  the
  namenode.
 
  It should definitely work then
 
  Cheers,
  Jagaran
 
 
 
  
  From: praveenesh kumar praveen...@gmail.com
  To: common-user@hadoop.apache.org
  Sent: Tue, 7 June, 2011 3:14:01 AM
  Subject: Re: NameNode is starting with exceptions whenever its trying to
  start
  datanodes
 
  But I dnt have any data on my HDFS.. I was having some date before.. but
  now
  I deleted all the files from HDFS..
  I dnt know why datanodes are taking time to start.. I guess because of
 this
  exception its taking more time to start.
 
  On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org
 wrote:
 
   On 06/07/2011 10:50 AM, praveenesh kumar wrote:
  
   The logs say
  
  
The ratio of reported blocks 0.9091 has not reached the threshold
  0.9990.
   Safe mode will be turned off automatically.
  
  
  
   not enough datanodes reported in, or they are missing data
  
 



Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread praveenesh kumar
how shall I clean my data dir ???
Cleaning data dir .. u mean to say is deleting all files from hdfs ???..

is there any special command to clean all the datanodes in one step ???

On Tue, Jun 7, 2011 at 11:46 PM, jagaran das jagaran_...@yahoo.co.inwrote:

 Cleaning data from data dir of datanode and formatting the name node may
 help
 you




 
 From: praveenesh kumar praveen...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Tue, 7 June, 2011 11:05:03 AM
  Subject: Re: NameNode is starting with exceptions whenever its trying to
 start
 datanodes

 Sorry I mean Some of your data nodes are not  getting connected..

 So are you sticking with your solution that you are saying to me.. to go
 for
 passwordless ssh for all datanodes..
 because for my hadoop.. all datanodes are running fine



 On Tue, Jun 7, 2011 at 11:32 PM, jagaran das jagaran_...@yahoo.co.in
 wrote:

  Sorry I mean Some of your data nodes are not  getting connected
 
 
 
  
  From: jagaran das jagaran_...@yahoo.co.in
  To: common-user@hadoop.apache.org
  Sent: Tue, 7 June, 2011 10:45:59 AM
   Subject: Re: NameNode is starting with exceptions whenever its trying to
  start
  datanodes
 
  Check two things:
 
  1. Some of your data node is getting connected, that means password less
  SSH is
  not working within nodes.
  2. Then Clear the Dir where you data is persisted in data nodes and
 format
  the
  namenode.
 
  It should definitely work then
 
  Cheers,
  Jagaran
 
 
 
  
  From: praveenesh kumar praveen...@gmail.com
  To: common-user@hadoop.apache.org
  Sent: Tue, 7 June, 2011 3:14:01 AM
  Subject: Re: NameNode is starting with exceptions whenever its trying to
  start
  datanodes
 
  But I dnt have any data on my HDFS.. I was having some date before.. but
  now
  I deleted all the files from HDFS..
  I dnt know why datanodes are taking time to start.. I guess because of
 this
  exception its taking more time to start.
 
  On Tue, Jun 7, 2011 at 3:34 PM, Steve Loughran ste...@apache.org
 wrote:
 
   On 06/07/2011 10:50 AM, praveenesh kumar wrote:
  
   The logs say
  
  
The ratio of reported blocks 0.9091 has not reached the threshold
  0.9990.
   Safe mode will be turned off automatically.
  
  
  
   not enough datanodes reported in, or they are missing data
  
 



Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar

2011-06-06 Thread praveenesh kumar
Hello guys..!!!

I am currently working on Hbase 0.90.3 and Hadoop 0.20.2

Since this hadoop version does not support rsync hdfs..
so I copied the *hadoop-core-append jar*  file from *hbase/lib* folder
into*hadoop folder
* and replaced it with* hadoop-0.20.2-core.jar*
which was suggested in the following link

http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm

I guess this is what have been mentioned in the link that I am doing. If I
am doing somehting wrong, kindly tell me.

But now after adding that jar file.. I am not able to run my hadoop.. I am
getting following exception messages on my screen

ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/util/PlatformName
ub13: Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.util.PlatformName
ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
ub13:   at java.security.AccessController.doPrivileged(Native Method)
ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
Program will exit.
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/hdfs/server/datanode/DataNode
ub13: starting secondarynamenode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/util/PlatformName
ub13: Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.util.PlatformName
ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
ub13:   at java.security.AccessController.doPrivileged(Native Method)
ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
Program will exit.
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode
Have I done something wrong.. Please guide me...!!

Thanks,
Praveenesh


Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hi,

Not able to see my email in the mail archive..So sending it again...!!!
Guys.. need your feedback..!!

Thanks,
Praveenesh
-- Forwarded message --
From: praveenesh kumar praveen...@gmail.com
Date: Mon, Jun 6, 2011 at 12:09 PM
Subject: Hadoop is not working after adding
hadoop-core-0.20-append-r1056497.jar
To: common-user@hadoop.apache.org, u...@hbase.apache.org


Hello guys..!!!

I am currently working on Hbase 0.90.3 and Hadoop 0.20.2

Since this hadoop version does not support rsync hdfs..
so I copied the *hadoop-core-append jar*  file from *hbase/lib* folder
into*hadoop folder
* and replaced it with* hadoop-0.20.2-core.jar*
which was suggested in the following link

http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm

I guess this is what have been mentioned in the link that I am doing. If I
am doing somehting wrong, kindly tell me.

But now after adding that jar file.. I am not able to run my hadoop.. I am
getting following exception messages on my screen

ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/util/PlatformName
ub13: Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.util.PlatformName
ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
ub13:   at java.security.AccessController.doPrivileged(Native Method)
ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
Program will exit.
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/hdfs/server/datanode/DataNode
ub13: starting secondarynamenode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/util/PlatformName
ub13: Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.util.PlatformName
ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
ub13:   at java.security.AccessController.doPrivileged(Native Method)
ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
Program will exit.
ub13: Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode
Have I done something wrong.. Please guide me...!!

Thanks,
Praveenesh


HBase Web UI showing exception everytime I am running it

2011-06-06 Thread praveenesh kumar
Hello guys..

I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2
cluster I dnt know why its happening..onlye 1 time its running .. after
that  its not..

HBASE WEB URL is showing the following exception

Why its happening...

Please help..!!

Thanks,
Praveenesh



HTTP ERROR 500

Problem accessing /master.jsp. Reason:

Trying to contact region server ub1:60020 for region .META.,,1, row '',
but failed after 10 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2318)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1771)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hello guys..

Changing the name of the hadoop-apppend-core.jar file to
hadoop-0.20.2-core.jar did the trick..
Its working now..
But is this the right solution to this problem ??

Thanks,
Praveenesh

On Mon, Jun 6, 2011 at 2:18 PM, praveenesh kumar praveen...@gmail.comwrote:


 Hi,

 Not able to see my email in the mail archive..So sending it again...!!!
 Guys.. need your feedback..!!

 Thanks,
 Praveenesh
 -- Forwarded message --
 From: praveenesh kumar praveen...@gmail.com
 Date: Mon, Jun 6, 2011 at 12:09 PM
 Subject: Hadoop is not working after adding
 hadoop-core-0.20-append-r1056497.jar
 To: common-user@hadoop.apache.org, u...@hbase.apache.org


 Hello guys..!!!

 I am currently working on Hbase 0.90.3 and Hadoop 0.20.2

 Since this hadoop version does not support rsync hdfs..
 so I copied the *hadoop-core-append jar*  file from *hbase/lib* folder
 into* hadoop folder* and replaced it with* hadoop-0.20.2-core.jar*
 which was suggested in the following link


 http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm

 I guess this is what have been mentioned in the link that I am doing. If I
 am doing somehting wrong, kindly tell me.

 But now after adding that jar file.. I am not able to run my hadoop.. I am
 getting following exception messages on my screen

 ub13: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/util/PlatformName
 ub13: Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.util.PlatformName
 ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 ub13:   at java.security.AccessController.doPrivileged(Native Method)
 ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
 Program will exit.
 ub13: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/hdfs/server/datanode/DataNode
 ub13: starting secondarynamenode, logging to
 /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out
 ub13: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/util/PlatformName
 ub13: Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.util.PlatformName
 ub13:   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 ub13:   at java.security.AccessController.doPrivileged(Native Method)
 ub13:   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 ub13:   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 ub13:   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 ub13: Could not find the main class: org.apache.hadoop.util.PlatformName.
 Program will exit.
 ub13: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode
 Have I done something wrong.. Please guide me...!!

 Thanks,
 Praveenesh




Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
It worked by renaming the hadoop-append*.jar file to hadoop-core.0.20.2.jar
file..I dnt know why.. but it worked..!!
Also.. After this thing.. my hbase started well for 1 time.. but after
that.. its not working..fine.. there is some problem is starting of region
servers..
I have send the exceptions in my other email.. I hope it will reach the
mailing group after some time.

Thanks,
Praveenesh

On Mon, Jun 6, 2011 at 8:59 PM, Stack st...@duboce.net wrote:

 On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com
 wrote:
  Changing the name of the hadoop-apppend-core.jar file to
  hadoop-0.20.2-core.jar did the trick..
  Its working now..
  But is this the right solution to this problem ??
 

 It would seem to be.  Did you have two hadoop*jar versions in your lib
 directory by any chance?  You did not remove the first?
 St.Ack



Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??

2011-06-03 Thread praveenesh kumar
Guys,

I am in a very big big confusion. Please.. I really need your feedbacks and
suggestions..

The scenario is like this...

I set up *Hadoop 0.20.2 cluster* of *12 nodes*..

Now I set up* Hbase 0.90.3*  *12 node cluster* on top of it.

But after all that experimenting and struggling.. I read the following
SHOCKING line on my Hbase web UI

---*  You are currently running the HMaster without HDFS append support
enabled. This may result in data loss. Please see the HBase wiki for
details. *

And when I searched more about it.. I found Michael G. Nolls article..
saying that *Hadoop 0.20.2 and Hbase 0.90.2 are not compatible*.

Is *Hadoop 0.20.2  also not compatible with Hbase 0.90.3 ???*

So does  it means I have to re-install hadoop 0.20. append if I want to use
Hbase.

I did a lot of struggle to reach upto this stage.. do I have to do all of it
again.. ??

Is there any other work around solution.. of not re-installing everything
again ??

Please help..!!!  :-(

Thanks,
Praveenesh


Fwd: Data node is taking time to start.. Error register getProtocolVersion in namenode..!!

2011-06-02 Thread praveenesh kumar
Hey guys..!!

Any suggestions..!!!

-- Forwarded message --
From: praveenesh kumar praveen...@gmail.com
Date: Wed, Jun 1, 2011 at 2:48 PM
Subject: Data node is taking time to start.. Error register
getProtocolVersion in namenode..!!
To: common-user@hadoop.apache.org


Hello Hadoop users.!!!

Well.. I am doing simple hadoop single node installation.. but my datanode
is taking some time to run..

If I go through the namenode logs.. I am getting some strange exception.

2011-06-02 03:59:59,959 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ub4/162.192.100.44
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb $
/
2011-06-02 04:00:00,034 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2011-06-02 04:00:00,038 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub4/
162.192.100.44:54310
2011-06-02 04:00:00,039 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=nu$
2011-06-02 04:00:00,040 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using contex$
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-06-02 04:00:00,084 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using$
2011-06-02 04:00:00,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-02 04:00:00,109 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1
2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 96 loaded in 0 seconds.
2011-06-02 04:00:00,550 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 489 msecs
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated blocks = 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
over-replicated blocks = 0
2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 0 secs.
2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 0 racks and 0 datanodes
2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
2011-06-02 04:00:01,093 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2011-06-02 04:00:01,137 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before ope$
2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070 webServer.getConnectors()[0].get$
2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50070
2011-06-02 04:00:01,138 INFO org.mortbay.log: jetty-6.1.14
2011-06-02 04:00:48,495 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50070
2011-06-02 04:00:48,495 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 54310: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 54310: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54310: starting
2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server

Hbase Web UI Interface on hbase 0.90.3 ?

2011-06-02 Thread praveenesh kumar
Hello guys.

I just have installed hbase on my hadoop cluster.
HMaster,HRegionServer,HQuorum Peer all are working fine.. as I can see these
processes running through JPS.

Is there any way to know which regionservers are running right and not ?
I mean is there some kind of hbase web UI or anyway to know the status of
Hbase cluster ??

Like for hadoop we can use hadoop fsck command and other web UIs ..

When I used hbase 0.20.6, I can see hbase web UI hosting on 60010 port by
default. But I can not see that Web UI when I am using Hbase 0.90.3 ?
Also I can not see hbase-default.xml in my $HBASE_HOME/conf folder. Is this
the reason for that ?
So do I need to set all those configurations on my own for this new version
??

Thanks,
Praveenesh


Data node is taking time to start.. Error register getProtocolVersion in namenode..!!

2011-06-01 Thread praveenesh kumar
Hello Hadoop users.!!!

Well.. I am doing simple hadoop single node installation.. but my datanode
is taking some time to run..

If I go through the namenode logs.. I am getting some strange exception.

2011-06-02 03:59:59,959 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ub4/162.192.100.44
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb $
/
2011-06-02 04:00:00,034 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2011-06-02 04:00:00,038 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ub4/
162.192.100.44:54310
2011-06-02 04:00:00,039 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=nu$
2011-06-02 04:00:00,040 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using contex$
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-06-02 04:00:00,074 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-06-02 04:00:00,084 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using$
2011-06-02 04:00:00,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-02 04:00:00,109 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1
2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2011-06-02 04:00:00,114 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 96 loaded in 0 seconds.
2011-06-02 04:00:00,550 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 489 msecs
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated blocks = 0
2011-06-02 04:00:00,552 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
over-replicated blocks = 0
2011-06-02 04:00:00,552 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 0 secs.
2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 0 racks and 0 datanodes
2011-06-02 04:00:00,553 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
2011-06-02 04:00:01,093 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2011-06-02 04:00:01,137 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before ope$
2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070 webServer.getConnectors()[0].get$
2011-06-02 04:00:01,138 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50070
2011-06-02 04:00:01,138 INFO org.mortbay.log: jetty-6.1.14
2011-06-02 04:00:48,495 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50070
2011-06-02 04:00:48,495 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 54310: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 54310: starting
2011-06-02 04:00:48,501 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54310: starting
2011-06-02 04:00:48,502 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54310: starting
2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 54310: starting
2011-06-02 04:00:48,503 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 54310: starting
2011-06-02 04:00:48,504 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 54310: starting
*2011-06-02 04:00:48,532 INFO 

How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hello guys,

In case any of you are working on HBASE, I just wrote a program by reading
some tutorials..
But no where its mentioned how to run codes on HBASE. In case anyone of you
has done some coding on HBASE , can you please tell me how to run it.

I am able to compile my code by adding hbase-core.jar and hadoop-core.jar in
classpath while compiling it.
But not able to figure out how to run it.

Whenever I am doing java ExampleClient ( which is my Hbase program), I am
getting the following error :

Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/HBaseConfiguration
at ExampleClient.main(ExampleClient.java:20)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
Thanks,
Praveenesh


Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
I am simply using HBase API, not doing any Map-reduce work on it.

Following is the code I have written , simply creating the file on HBase:

import java.io.IOException;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class ExampleClient {
 public static void main(String args []) throws IOException
 {
  HBaseConfiguration config = new HBaseConfiguration();

  HBaseAdmin admin = new HBaseAdmin(config);
  HTableDescriptor htd = new HTableDescriptor(test);
  HColumnDescriptor hcd = new HColumnDescriptor(data);
  htd.addFamily(hcd);
  admin.createTable(htd);

  byte [] tablename = htd.getName();
  HTableDescriptor [] tables = admin.listTables();

  if(tables.length !=1  Bytes.equals(tablename, tables[0].getName()))
  {
   throw new IOException(Failed to create table);
  }

  HTable table = new HTable(config,tablename);
  byte[] row1 = Bytes.toBytes(row1);
  Put p1 = new Put(row1);
  byte[] databytes = Bytes.toBytes(data);
  p1.add(databytes,Bytes.toBytes(1),Bytes.toBytes(value1));
  table.put(p1);

  Get g = new Get(row1);
  Result result = table.get(g);
  System.out.println(Get : + result);
  Scan scan = new Scan();
  ResultScanner scanner = table.getScanner(scan);
  try
  {
   for(Result scannerResult: scanner)
   {
System.out.println(Scan :  + scannerResult);
   }
  }catch(Exception e ){
   e.printStackTrace();
  }
  finally{
   scanner.close();
  }
  table.close();
 }
}

Now I have set the classpath variable in /etc/environment as
MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2.2.jar

now I am compiling my code with javac command

*$javac -classpath $MYCLASSPATH ExampleClient.java*

It is working fine.
While running, I am using java command

*$java -classpath $MYCLASSPATH ExampleClient*, then I am getting the
following error :
Exception in thread main java.lang.NoClassDefFoundError: ExampleClient
Caused by: java.lang.ClassNotFoundException: ExampleClient
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: ExampleClient. Program will exit.
But I am running the code from the same location. and ExampleClient.class
file exists at that location.





On Tue, May 24, 2011 at 3:07 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 How do you execute the client (command line) do you use the java or the
 hadoop command?
 It seems that there is an error in your classpath when running the client
 job. The classpath when compiling classes that implement the client is
 different from the classpath when your client is executed since hadoop and
 hbase carry their own environment. Maybe tha following link helps:


 http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath


 regards
 Christian

 ---8

 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 München, Deutschland
 Tel.: +49 (89) 636-42722
 Fax: +49 (89) 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme;
 Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Brigitte Ederer, Joe
 Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
 Solmssen; Sitz der Gesellschaft: Berlin und München, Deutschland;
 Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684;
 WEEE-Reg.-Nr. DE 23691322


 -Ursprüngliche Nachricht-
 Von: praveenesh kumar [mailto:praveen...@gmail.com]
 Gesendet: Dienstag, 24. Mai 2011 11:08
 An: common-user@hadoop.apache.org
 Betreff: How to compile HBase code ?

 Hello guys,

 In case any of you are working on HBASE, I just wrote a program by reading
 some tutorials..
 But no where its mentioned how to run codes on HBASE. In case anyone of you
 has done some coding on HBASE , can you please tell me how to run it.

 I am able to compile my code by adding hbase-core.jar and hadoop-core.jar
 in
 classpath while compiling it.
 But not able to figure out how to run it.

 Whenever I am doing java ExampleClient ( which is my

Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hey Harsh,

Actually I mailed to HBase mailing list also.. but since I wanted to get
this thing done as soon as possible so I mailed in this group also..
anyways I will take care of this in future , although  I got more responses
in this mailing list only :-)

Anyways problem is solved..

 What i did  is added the folder containing my .class file in the classpath,
along with commons-logging-1.0.4.jar and log4j-1.2.15.jar in my classpath:

so now Myclasspath variable looks like :


*
MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2..2.jar::/usr/local/hadoop/hbase/hbase/lib/commons-logging-1.0.4.jar:/usr/local/hadoop/hbase/hbase/lib/log4j-1.2.15.jar:/usr/local/hadoop/hbase/
*
* *
and then I used* java -classpath $MYCLASSPATH ExampleClient.java*
now its running..


Thanks.!!!
Praveenesh

On Tue, May 24, 2011 at 3:55 PM, Harsh J ha...@cloudera.com wrote:

 Praveenesh,

 HBase has their own user mailing lists where such queries ought to go.
 Am moving the discussion to u...@hbase.apache.org and bcc-ing
 common-user@ here. Also added you to cc.

 Regarding your first error, going forward you can use the useful
 `hbase classpath` to generate a HBase-provided classpath list for you
 automatically. Something like:

 $ MYCLASSPATH=`hbase classpath`

 Regarding the second, latest one as below, your ExampleClient.class
 isn't on the MYCLASSPATH (nor is the directory it is under, i.e. '.')
 so Java can't really find it. This is not a HBase issue.

 HTH.

 On Tue, May 24, 2011 at 3:23 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  I am simply using HBase API, not doing any Map-reduce work on it.
 
  Following is the code I have written , simply creating the file on HBase:
 
  import java.io.IOException;
  import org.apache.hadoop.hbase.HBaseConfiguration;
  import org.apache.hadoop.hbase.HColumnDescriptor;
  import org.apache.hadoop.hbase.HTableDescriptor;
  import org.apache.hadoop.hbase.client.Get;
  import org.apache.hadoop.hbase.client.HBaseAdmin;
  import org.apache.hadoop.hbase.client.HTable;
  import org.apache.hadoop.hbase.client.Put;
  import org.apache.hadoop.hbase.client.Result;
  import org.apache.hadoop.hbase.client.ResultScanner;
  import org.apache.hadoop.hbase.client.Scan;
  import org.apache.hadoop.hbase.util.Bytes;
 
  public class ExampleClient {
   public static void main(String args []) throws IOException
   {
   HBaseConfiguration config = new HBaseConfiguration();
 
   HBaseAdmin admin = new HBaseAdmin(config);
   HTableDescriptor htd = new HTableDescriptor(test);
   HColumnDescriptor hcd = new HColumnDescriptor(data);
   htd.addFamily(hcd);
   admin.createTable(htd);
 
   byte [] tablename = htd.getName();
   HTableDescriptor [] tables = admin.listTables();
 
   if(tables.length !=1  Bytes.equals(tablename, tables[0].getName()))
   {
throw new IOException(Failed to create table);
   }
 
   HTable table = new HTable(config,tablename);
   byte[] row1 = Bytes.toBytes(row1);
   Put p1 = new Put(row1);
   byte[] databytes = Bytes.toBytes(data);
   p1.add(databytes,Bytes.toBytes(1),Bytes.toBytes(value1));
   table.put(p1);
 
   Get g = new Get(row1);
   Result result = table.get(g);
   System.out.println(Get : + result);
   Scan scan = new Scan();
   ResultScanner scanner = table.getScanner(scan);
   try
   {
for(Result scannerResult: scanner)
{
 System.out.println(Scan :  + scannerResult);
}
   }catch(Exception e ){
e.printStackTrace();
   }
   finally{
scanner.close();
   }
   table.close();
   }
  }
 
  Now I have set the classpath variable in /etc/environment as
 
 MYCLASSPATH=/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.20.6.jar:/usr/local/hadoop/hbase/hbase/lib/zookeeper-3.2.2.jar
 
  now I am compiling my code with javac command
 
  *$javac -classpath $MYCLASSPATH ExampleClient.java*
 
  It is working fine.
  While running, I am using java command
 
  *$java -classpath $MYCLASSPATH ExampleClient*, then I am getting the
  following error :
  Exception in thread main java.lang.NoClassDefFoundError: ExampleClient
  Caused by: java.lang.ClassNotFoundException: ExampleClient
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
  Could not find the main class: ExampleClient. Program will exit.
  But I am running the code from the same location. and ExampleClient.class
  file exists at that location.
 
 
 
 
 
  On Tue, May 24, 2011 at 3:07 PM, Kleegrewe, Christian 
  christian.kleegr...@siemens.com wrote:
 
  How do you execute the client (command line) do you use

Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hey harsh,

I tried that.. its not working.
I am using hbase 0.20.6.
there is no command like bin/hbase classpath :

hadoop@ub6:/usr/local/hadoop/hbase$ hbase
Usage: hbase command
where command is one of:
  shellrun the HBase shell
  master   run an HBase HMaster node
  regionserver run an HBase HRegionServer node
  rest run an HBase REST server
  thrift   run an HBase Thrift server
  zookeeperrun a Zookeeper server
  migrate  upgrade an hbase.rootdir
 or
  CLASSNAMErun the class named CLASSNAME
Thanks,
Praveenesh
On Tue, May 24, 2011 at 4:59 PM, Harsh J ha...@cloudera.com wrote:

 Praveenesh,

 On Tue, May 24, 2011 at 4:31 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hey Harsh,
 
  Actually I mailed to HBase mailing list also.. but since I wanted to get
  this thing done as soon as possible so I mailed in this group also..
  anyways I will take care of this in future , although  I got more
 responses
  in this mailing list only :-)
 
  Anyways problem is solved..

 Good to know your problem resolved. You can also use the `bin/hbase
 classpath` utility to generate HBase parts of the classpath
 automatically in the future, instead of adding classes manually -
 saves you time.

 --
 Harsh J



Fwd: Hbase question,,!!

2011-05-23 Thread praveenesh kumar
Please any suggestions..!!


-- Forwarded message --
From: praveenesh kumar praveen...@gmail.com
Date: Sun, May 22, 2011 at 2:23 PM
Subject: Hbase question,,!!
To: common-user@hadoop.apache.org


Okay guys.. so I have hadoop cluster of 5 nodes.. the configuration look
like this.

162.192.100.53 -- Master as well as slave

Slave nodes :

162.192.100.52
162.192.100.51
162.192.100.50
162.192.100.49

Now I want to implement HBASE on my hadoop cluster.. What can be the best
configuration for my HBASE  based on my hadoop structure ??

Thanks,
Praveenesh


Re: Installing Hadoop

2011-05-23 Thread praveenesh kumar
OR you can refer to following tutorial for a referal..!!

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/



On Mon, May 23, 2011 at 11:06 PM, jgroups mohitanch...@gmail.com wrote:


 I am trying to install hadoop in cluster env with multiple nodes. Following
 instructions from

 http://hadoop.apache.org/common/docs/r0.17.0/cluster_setup.html
 http://hadoop.apache.org/common/docs/r0.17.0/cluster_setup.html

 That page refers to hadoop-site.xml. But I don't see that in
 /hadoop-0.20.203.0/conf. Are there more upto date installation instructions
 somwhere else?
 --
 View this message in context:
 http://old.nabble.com/Installing-Hadoop-tp31683812p31683812.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Hbase question,,!!

2011-05-22 Thread praveenesh kumar
Okay guys.. so I have hadoop cluster of 5 nodes.. the configuration look
like this.

162.192.100.53 -- Master as well as slave

Slave nodes :

162.192.100.52
162.192.100.51
162.192.100.50
162.192.100.49

Now I want to implement HBASE on my hadoop cluster.. What can be the best
configuration for my HBASE  based on my hadoop structure ??

Thanks,
Praveenesh


Re: Why Only 1 Reducer is running ??

2011-05-22 Thread praveenesh kumar
My program is a basic program like this :

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

public static class Map extends MapReduceBase implements
MapperLongWritable, Text, Text, IntWritable {
 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollectorText,
IntWritable output, Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer tokenizer = new StringTokenizer(line);
   while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
   }
  }
 }

public static class Reduce extends MapReduceBase implements
ReducerText, IntWritable, Text, IntWritable {
  public void reduce(Text key, IteratorIntWritable values,
OutputCollectorText, IntWritable output, Reporter reporter) throws
IOException {
int sum = 0;
while (values.hasNext()) {
  sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
  }
}

public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(WordCount.class);
  conf.setJobName(wordcount);

  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);

  conf.setMapperClass(Map.class);
  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);

  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);

  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));
   Job.Job.setNumReduceTasks(10);
  JobClient.runJob(conf);
}
 }


How to use Job.setNumReduceTasks(INT) function here.. I am not using any Job
class object here.

Thanks.
Praveenesh


On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts evert.lamme...@sara.nlwrote:

 Hi Praveenesh,

 * You can set the maximum amount of reducers per node in your
 mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set
 to 2).
 * You can set the default number of reduce tasks with mapred.reduce.tasks
 (default set to 1 - this causes your single reducer).
 * Your job can try to override this setting by calling
 Job.setNumReduceTasks(INT) (
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
 ).

 Cheers,
 Evert


  -Original Message-
  From: modemide [mailto:modem...@gmail.com]
  Sent: vrijdag 20 mei 2011 15:26
  To: common-user@hadoop.apache.org
  Subject: Re: Why Only 1 Reducer is running ??
 
  what does your mapred-site.xml file say?
 
  I've used wordcount and had close to 12 reduces running on a 6
  datanode cluster on a 3 GB file.
 
 
  I have a configuration in there which says:
  mapred.reduce.tasks = 12
 
  The reason I chose 12 was because it was recommended that I choose 2x
  number of tasktrackers.
 
 
 
 
 
  On 5/20/11, praveenesh kumar praveen...@gmail.com wrote:
   Hello everyone,
  
   I am using wordcount application to test on my hadoop cluster of 5
  nodes.
   The file size is around 5 GB.
   Its taking around 2 min - 40 sec for execution.
   But when I am checking the JobTracker web portal, I am seeing only
  one
   reducer is running. Why so  ??
   How can I change the code so that I will run multiple reducers also
  ??
  
   Thanks,
   Praveenesh
  



Re: Why Only 1 Reducer is running ??

2011-05-22 Thread praveenesh kumar
Okie I figured it out.. it was simple..

conf.setsetNumReduceTasks(10);
my mistake..

Anyhow when I am running 10 reducers for Wordcount problem.. I am seeing
only slight increase in the speed of the program... Why so ??
So more reducers do not gauranteee faster execution ??
How can we decide to use how many reducers to make our program run in the
best way possible ??

Thanks,
Praveenesh

On Mon, May 23, 2011 at 10:08 AM, praveenesh kumar praveen...@gmail.comwrote:

 My program is a basic program like this :

 import java.io.IOException;
 import java.util.*;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.conf.*;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;

 public class WordCount {

 public static class Map extends MapReduceBase implements
 MapperLongWritable, Text, Text, IntWritable {
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

 public void map(LongWritable key, Text value, OutputCollectorText,
 IntWritable output, Reporter reporter) throws IOException {
   String line = value.toString();
   StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
   }
  }

 public static class Reduce extends MapReduceBase implements
 ReducerText, IntWritable, Text, IntWritable {
   public void reduce(Text key, IteratorIntWritable values,
 OutputCollectorText, IntWritable output, Reporter reporter) throws
 IOException {
 int sum = 0;
 while (values.hasNext()) {
   sum += values.next().get();
 }
 output.collect(key, new IntWritable(sum));
   }
 }

 public static void main(String[] args) throws Exception {
   JobConf conf = new JobConf(WordCount.class);
   conf.setJobName(wordcount);

   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(IntWritable.class);

   conf.setMapperClass(Map.class);
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);

   conf.setInputFormat(TextInputFormat.class);
   conf.setOutputFormat(TextOutputFormat.class);

   FileInputFormat.setInputPaths(conf, new Path(args[0]));
   FileOutputFormat.setOutputPath(conf, new Path(args[1]));
Job.setNumReduceTasks(10);
   JobClient.runJob(conf);
 }
  }


 How to use Job.setNumReduceTasks(INT) function here.. I am not using any
 Job class object here.

 Thanks.
 Praveenesh


   On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts 
 evert.lamme...@sara.nlwrote:

 Hi Praveenesh,

 * You can set the maximum amount of reducers per node in your
 mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set
 to 2).
 * You can set the default number of reduce tasks with mapred.reduce.tasks
 (default set to 1 - this causes your single reducer).
 * Your job can try to override this setting by calling
 Job.setNumReduceTasks(INT) (
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
 ).

 Cheers,
 Evert


  -Original Message-
  From: modemide [mailto:modem...@gmail.com]
  Sent: vrijdag 20 mei 2011 15:26
  To: common-user@hadoop.apache.org
  Subject: Re: Why Only 1 Reducer is running ??
 
  what does your mapred-site.xml file say?
 
  I've used wordcount and had close to 12 reduces running on a 6
  datanode cluster on a 3 GB file.
 
 
  I have a configuration in there which says:
  mapred.reduce.tasks = 12
 
  The reason I chose 12 was because it was recommended that I choose 2x
  number of tasktrackers.
 
 
 
 
 
  On 5/20/11, praveenesh kumar praveen...@gmail.com wrote:
   Hello everyone,
  
   I am using wordcount application to test on my hadoop cluster of 5
  nodes.
   The file size is around 5 GB.
   Its taking around 2 min - 40 sec for execution.
   But when I am checking the JobTracker web portal, I am seeing only
  one
   reducer is running. Why so  ??
   How can I change the code so that I will run multiple reducers also
  ??
  
   Thanks,
   Praveenesh
  





How to see block information on NameNode ?

2011-05-21 Thread praveenesh kumar
hey..!!

I have a question.
If I copy some file on HDFS file system, it will get split into blocks and
Namenode will keep all these meta info with it.
How can I see that info.
I copied 5 GB file on NameNode, but I see that file only on the NameNode..
It doesnot get split into blocks..??
How can I see whether my file is getting split into blocks and which data
node is keeping which block ??

Thanks,
Praveenesh


Why Only 1 Reducer is running ??

2011-05-20 Thread praveenesh kumar
Hello everyone,

I am using wordcount application to test on my hadoop cluster of 5 nodes.
The file size is around 5 GB.
Its taking around 2 min - 40 sec for execution.
But when I am checking the JobTracker web portal, I am seeing only one
reducer is running. Why so  ??
How can I change the code so that I will run multiple reducers also ??

Thanks,
Praveenesh


Re: Why Only 1 Reducer is running ??

2011-05-20 Thread praveenesh kumar
I am using the wordcount example that comes along with hadoop.
How can I configure it to make it use multiple reducers.
I guess mutiple reducers will make it run more fast .. Does it ??


On Fri, May 20, 2011 at 6:51 PM, James Seigel Tynt ja...@tynt.com wrote:

 The job could be designed to use one reducer

 On 2011-05-20, at 7:19 AM, praveenesh kumar praveen...@gmail.com wrote:

  Hello everyone,
 
  I am using wordcount application to test on my hadoop cluster of 5 nodes.
  The file size is around 5 GB.
  Its taking around 2 min - 40 sec for execution.
  But when I am checking the JobTracker web portal, I am seeing only one
  reducer is running. Why so  ??
  How can I change the code so that I will run multiple reducers also ??
 
  Thanks,
  Praveenesh



How hadoop parse input files into (Key,Value) pairs ??

2011-05-05 Thread praveenesh kumar
Hi,

As we know hadoop mapper takes input as (Key,Value) pairs and generate
intermediate (Key,Value) pairs and usually we give input to our Mapper as a
text file.
How hadoop understand this and parse our input text file into (Key,Value)
Pairs

Usually our mapper looks like  --
*public* *void* map(LongWritable key, Text value,OutputCollectorText, Text
outputCollector, Reporter reporter) *throws* IOException {

String word = value.toString();

//Some lines of code

}

So if I pass any text file as input, it is taking every line as VALUE to
Mapper..on which I will do some processing and put it to OutputCollector.
But how hadoop parsed my text file into ( Key,Value ) pair and how can we
tell hadoop what (key,value) it should give to mapper ??

Thanks.


Can we access NameNode HDFS from slave Nodes ??

2011-05-05 Thread praveenesh kumar
hey,

Can we access NameNode's hdfs on our slave machines ??

I am just running command hadoop dfs -ls on my slave machine ( running
tasktracker and Datanode), and its giving me the following output :

hadoop@ub12:~$ hadoop dfs -ls
11/05/05 18:31:54 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 0 time(s).
11/05/05 18:31:55 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 1 time(s).
11/05/05 18:31:56 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 2 time(s).
11/05/05 18:31:57 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 3 time(s).
11/05/05 18:31:58 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 4 time(s).
11/05/05 18:31:59 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 5 time(s).
11/05/05 18:32:00 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 6 time(s).
11/05/05 18:32:01 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 7 time(s).
11/05/05 18:32:02 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 8 time(s).
11/05/05 18:32:03 INFO ipc.Client: Retrying connect to server: ub13/
162.192.100.53:54310. Already tried 9 time(s).
Bad connection to FS. command aborted.
I just restarted my Master Node ( and run start-all.sh )

The output on my master node is

hadoop@ub13:/usr/local/hadoop$ start-all.sh
starting namenode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-ub13.out
ub11: starting datanode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub11.out
ub10: starting datanode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub10.out
ub12: starting datanode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub12.out
ub13: starting datanode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-ub13.out
ub13: starting secondarynamenode, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ub13.out
starting jobtracker, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-ub13.out
ub10: starting tasktracker, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub10.out
ub11: starting tasktracker, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub11.out
ub12: starting tasktracker, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub12.out
ub13: starting tasktracker, logging to
/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ub13.out
hadoop@ub13:/usr/local/hadoop$ jps
6471 NameNode
7070 Jps
6875 JobTracker
6632 DataNode
7030 TaskTracker
6795 SecondaryNameNode
Thanks,
Praveenesh


org.apache.hadoop.mapred.InvalidInputException ??

2011-04-25 Thread praveenesh kumar
Hi,

I am new to hadoop and the scenario is like this :

I have hadoop installed on a linux machine having IP as (162.192.100.46)
and I have another window machine with eclipse and hadoop plugin installed..
I am able to connect to linux hadoop machine and can see the dfs location
and mapred folder using my plugin. I copied all the hadoop jar files from
linux to windows and set them in my eclipse.

I am trying to run a sample small code from windows to the linux hadoop
machine

Code :

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class TestDriver {

 public static void main(String[] args) {
  JobClient client = new JobClient();
  JobConf conf = new JobConf(TestDriver.class);

  // TODO: specify output types
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);
  // TODO: specify input and output DIRECTORIES (not files)
  //conf.setInputPath(new Path(src));
  //conf.setOutputPath(new Path(out));

  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);



  FileInputFormat.setInputPaths(conf, new Path(In));
  FileOutputFormat.setOutputPath(conf, new Path(Out));


  // TODO: specify a mapper
  conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);

  // TODO: specify a reducer
  conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

  client.setConf(conf);
  try {
   JobClient.runJob(conf);
  } catch (Exception e) {
   e.printStackTrace();
  }
 }
}



Whenever I am trying to run the code on eclipse plugin ..
I am getting the following error :


*11/04/25 13:39:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=*

*11/04/25 13:39:16 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.*
*

org.apache.hadoop.mapred.InvalidInputException**: Input path does not exist:
hdfs://162.192.100.46:54310/user/hadoop/In*

*at org.apache.hadoop.mapred.FileInputFormat.listStatus(**
FileInputFormat.java:190**)*

*at org.apache.hadoop.mapred.FileInputFormat.getSplits(**
FileInputFormat.java:201**)*

*at org.apache.hadoop.mapred.JobClient.writeOldSplits(**JobClient.java:810**
)*

*at org.apache.hadoop.mapred.JobClient.submitJobInternal(**
JobClient.java:781**)*

*at org.apache.hadoop.mapred.JobClient.submitJob(**JobClient.java:730**)*

*at org.apache.hadoop.mapred.JobClient.runJob(**JobClient.java:1249**)*

*at TestDriver.main(**TestDriver.java:46**)*
I know I am doing something wrong, Can anyone tell me where I am wrong, and
how can I run my code from windows to that linux hadoop machine.

Thanks,
Praveenesh


Re: org.apache.hadoop.mapred.InvalidInputException ??

2011-04-25 Thread praveenesh kumar
Hi,
I am able to run hadoop map-reduce wordcount example on my linux machine..
Means my hadoop settings are correct on my linux machine..
I don't know about valid path you are talking about ?? where to set this
thing ??




On Mon, Apr 25, 2011 at 11:58 AM, Harsh J ha...@cloudera.com wrote:

 Do you have a valid path /user/hadoop/In (it must be a file, or a
 directory with files)?

 On Mon, Apr 25, 2011 at 11:32 AM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hi,
 
  I am new to hadoop and the scenario is like this :
 
  I have hadoop installed on a linux machine having IP as (162.192.100.46)
  and I have another window machine with eclipse and hadoop plugin
 installed..
  I am able to connect to linux hadoop machine and can see the dfs location
  and mapred folder using my plugin. I copied all the hadoop jar files from
  linux to windows and set them in my eclipse.
 
  I am trying to run a sample small code from windows to the linux hadoop
  machine
 
  Code :
 
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.IntWritable;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.mapred.FileInputFormat;
  import org.apache.hadoop.mapred.FileOutputFormat;
  import org.apache.hadoop.mapred.JobClient;
  import org.apache.hadoop.mapred.JobConf;
  import org.apache.hadoop.mapred.Mapper;
  import org.apache.hadoop.mapred.Reducer;
  import org.apache.hadoop.mapred.TextInputFormat;
  import org.apache.hadoop.mapred.TextOutputFormat;
 
  public class TestDriver {
 
   public static void main(String[] args) {
   JobClient client = new JobClient();
   JobConf conf = new JobConf(TestDriver.class);
 
   // TODO: specify output types
   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(IntWritable.class);
   // TODO: specify input and output DIRECTORIES (not files)
   //conf.setInputPath(new Path(src));
   //conf.setOutputPath(new Path(out));
 
   conf.setInputFormat(TextInputFormat.class);
   conf.setOutputFormat(TextOutputFormat.class);
 
 
 
   FileInputFormat.setInputPaths(conf, new Path(In));
   FileOutputFormat.setOutputPath(conf, new Path(Out));
 
 
   // TODO: specify a mapper
   conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 
   // TODO: specify a reducer
 
  conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
 
   client.setConf(conf);
   try {
JobClient.runJob(conf);
   } catch (Exception e) {
e.printStackTrace();
   }
   }
  }
 
 
 
  Whenever I am trying to run the code on eclipse plugin ..
  I am getting the following error :
 
 
  *11/04/25 13:39:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
  processName=JobTracker, sessionId=*
 
  *11/04/25 13:39:16 WARN mapred.JobClient: Use GenericOptionsParser for
  parsing the arguments. Applications should implement Tool for the same.*
  *
 
  org.apache.hadoop.mapred.InvalidInputException**: Input path does not
 exist:
  hdfs://162.192.100.46:54310/user/hadoop/In*
 
  *at org.apache.hadoop.mapred.FileInputFormat.listStatus(**
  FileInputFormat.java:190**)*
 
  *at org.apache.hadoop.mapred.FileInputFormat.getSplits(**
  FileInputFormat.java:201**)*
 
  *at
 org.apache.hadoop.mapred.JobClient.writeOldSplits(**JobClient.java:810**
  )*
 
  *at org.apache.hadoop.mapred.JobClient.submitJobInternal(**
  JobClient.java:781**)*
 
  *at org.apache.hadoop.mapred.JobClient.submitJob(**JobClient.java:730**)*
 
  *at org.apache.hadoop.mapred.JobClient.runJob(**JobClient.java:1249**)*
 
  *at TestDriver.main(**TestDriver.java:46**)*
  I know I am doing something wrong, Can anyone tell me where I am wrong,
 and
  how can I run my code from windows to that linux hadoop machine.
 
  Thanks,
  Praveenesh
 



 --
 Harsh J



Error while compiling the program

2011-04-25 Thread praveenesh kumar
Hi,

I am running the following code (Gender.java) on my hadoop .


import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class Gender {

private static String genderCheck = female;

public static class Map extends MapReduceBase implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text locText = new Text();

public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException {
String line = value.toString();
String location = line.split(,)[14] + , +
line.split(,)[15];
long male = 0L;
long female = 0L;
if (line.split(,)[17].matches(\d+) 
line.split(,)[18].matches(\d+)) {
male = Long.parseLong(line.split(,)[17]);
female = Long.parseLong(line.split(,)[18]);
}
long diff = male - female;
locText.set(location);
if (Gender.genderCheck.toLowerCase().equals(female)  diff 
0) {
output.collect(locText, new LongWritable(diff * -1L));
}
else if (Gender.genderCheck.toLowerCase().equals(male)  diff
 0) {
output.collect(locText, new
LongWritable(diff));
}
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Gender.class);
conf.setJobName(gender);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(Map.class);

if (args.length != 3) {
System.out.println(Usage:);
System.out.println([male/female] /path/to/2kh/files
/path/to/output);
System.exit(1);
}

if (!args[0].equalsIgnoreCase(male) 
!args[0].equalsIgnoreCase(female)) {
System.out.println(first argument must be male or female);
System.exit(1);
}
Gender.genderCheck = args[0];

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[1]));
FileOutputFormat.setOutputPath(conf, new Path(args[2]));
JobClient.runJob(conf);
}

}

I am getting the following exception while compiling this  :

*Gender.java:14: Gender.Map is not abstract and does not override abstract
method
map(java.lang.Object,java.lang.Object,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter)
in org.apache.hadoop.mapred.Mapper
public static class Map extends MapReduceBase implements Mapper {
  ^
Note: Gender.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Gender.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
*
Anyone suggest me how to debug this error ??


Re: Error while compiling the program

2011-04-25 Thread praveenesh kumar
Thanks Joey..!!
It compiled..

Regards,
Praveenesh

On Mon, Apr 25, 2011 at 3:47 PM, Joey Echeverria j...@cloudera.com wrote:

 Your delcaration of the Map class needs to include the input and
 output types, e.g.:

 public static class Map extends MapReduceBase implements
 MapperLongWritable, Text, Text, LongWritable {
 ...
 }

 -Joey

 On Mon, Apr 25, 2011 at 4:38 AM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hi,
 
  I am running the following code (Gender.java) on my hadoop .
 
 
  import java.io.IOException;
  import java.util.*;
 
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.conf.*;
  import org.apache.hadoop.io.*;
  import org.apache.hadoop.mapred.*;
  import org.apache.hadoop.util.*;
 
  public class Gender {
 
 private static String genderCheck = female;
 
 public static class Map extends MapReduceBase implements Mapper {
 private final static IntWritable one = new IntWritable(1);
 private Text locText = new Text();
 
 public void map(LongWritable key, Text value, OutputCollector
  output, Reporter reporter) throws IOException {
 String line = value.toString();
 String location = line.split(,)[14] + , +
  line.split(,)[15];
 long male = 0L;
 long female = 0L;
 if (line.split(,)[17].matches(\d+) 
  line.split(,)[18].matches(\d+)) {
 male = Long.parseLong(line.split(,)[17]);
 female = Long.parseLong(line.split(,)[18]);
 }
 long diff = male - female;
 locText.set(location);
 if (Gender.genderCheck.toLowerCase().equals(female)  diff
 
  0) {
 output.collect(locText, new LongWritable(diff * -1L));
 }
 else if (Gender.genderCheck.toLowerCase().equals(male) 
 diff
  0) {
 output.collect(locText, new
  LongWritable(diff));
 }
 }
 }
 
 public static void main(String[] args) throws Exception {
 JobConf conf = new JobConf(Gender.class);
 conf.setJobName(gender);
 conf.setOutputKeyClass(Text.class);
 conf.setOutputValueClass(LongWritable.class);
 conf.setMapperClass(Map.class);
 
 if (args.length != 3) {
 System.out.println(Usage:);
 System.out.println([male/female] /path/to/2kh/files
  /path/to/output);
 System.exit(1);
 }
 
 if (!args[0].equalsIgnoreCase(male) 
  !args[0].equalsIgnoreCase(female)) {
 System.out.println(first argument must be male or female);
 System.exit(1);
 }
 Gender.genderCheck = args[0];
 
 conf.setInputFormat(TextInputFormat.class);
 conf.setOutputFormat(TextOutputFormat.class);
 FileInputFormat.setInputPaths(conf, new Path(args[1]));
 FileOutputFormat.setOutputPath(conf, new Path(args[2]));
 JobClient.runJob(conf);
 }
 
  }
 
  I am getting the following exception while compiling this  :
 
  *Gender.java:14: Gender.Map is not abstract and does not override
 abstract
  method
 
 map(java.lang.Object,java.lang.Object,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter)
  in org.apache.hadoop.mapred.Mapper
 public static class Map extends MapReduceBase implements Mapper {
   ^
  Note: Gender.java uses or overrides a deprecated API.
  Note: Recompile with -Xlint:deprecation for details.
  Note: Gender.java uses unchecked or unsafe operations.
  Note: Recompile with -Xlint:unchecked for details.
  *
  Anyone suggest me how to debug this error ??
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



hadoop dfs -copyFromLocal ??

2011-04-25 Thread praveenesh kumar
Hi,

I am learning hadoop.

Whenever we use hadoop dfs -copyFromLocal input-file name output-file
name
I assume the file is copied from linux file system to hadoop file system

However the output of the command shows us that file is somewhere stored in
/user/hadoop/*

But if we search it from linux, we can not see those files.. why so ???

Can I go to the location of the files which get copied to DFS from linux ??

Suppose I have copied some file  into DFS and now from other system, I want
to give that file as an input.. how can I give that file as input to the
program. I mean how can I remotely access to the files that are copied to
DFS and pass them as input to my programs.??

Thanks,
Praveenesh


Hadoop from Windows ??

2011-04-25 Thread praveenesh kumar
The problem I am facing is

1  I have 1 Windows System. I am running eclipse with hadoop - plugin.. Its
not a part of hadoop cluster. I am able to connect to hadoop systems and can
view DFS and MAPRED folders using this plugin. If I am able to view the
contents of the hadoop, so I am assuming that I can connect to the hadoop
system from my windows.

2. Now I am writing some program from my windows machine and try to run it
on hadoop machines.
but whenever I am trying to do that.. I am getting the following error :


*1/04/25 18:24:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=*
*
11/04/25 18:24:21 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/04/25 18:24:21 INFO input.FileInputFormat: Total input paths to process :
1
Exception in thread main
java.io.IOException: Cannot run program chmod: CreateProcess error=2, The
system cannot find the file specified
at java.lang.ProcessBuilder.start ProcessBuilder.java:460)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:354)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:337)
at
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:481)

at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:473)

at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:280)

at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1216)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.init(LocalJobRunner.java:92)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at WordCount.run(WordCount.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:98) *
*Caused by: *
*
java.io.IOException: CreateProcess error=2, The system cannot find the file
specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.init(ProcessImpl.java:81)
at java.lang.ProcessImpl.start(ProcessImpl.java:30)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more*


 My program is  :

import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 public class WordCount extends Configured implements Tool {
public static class MapClass extends MapperObject, Text, Text, IntWritable
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

/**
* A reducer class that just emits the sum of the input values.
*/
public static class Reduce extends ReducerText, IntWritable, Text,
IntWritable {
public void reduce(Text key, IterableIntWritable values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key,
new IntWritable(sum));
}
}
static int printUsage() {
System.out.println(wordcount [-r reduces] input output);
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, WordCount example for hadoop 0.20.1);
job.setJarByClass(WordCount.class);
job.setMapperClass(MapClass.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
// the keys are words (strings)
job.setOutputKeyClass(Text.class);
// the values are counts (ints)
job.setOutputValueClass(IntWritable.class);
 ListString other_args = new ArrayListString();
for(int i=0; i  args.length; ++i) {
try {
// The number of map tasks was earlier configurable, // But with hadoop
0.20.1, it is decided by the framework.
// Since this heavily 

HBASE on Hadoop

2011-04-25 Thread praveenesh kumar
Hello everyone,

Thanks everyone for guiding me everytime. I am able to setup hadoop cluster
of 10 nodes.
Now comes HBASE..!!!

I am new to all this...
My problem is I have huge data to analyze.
so shall I go for single node Hbase installation on all nodes or go for
distributed Hbase installation.??

How distributed installation is different from single node installaion ??
Now suppose if I have distributed Hbase...
and If I design some table on my master node.. and then store data on it..
say around 100M. How the data is going to be distributed.. Will HBASE do it
automatically or we have to write codes for getting it distributed ??
Is there any good tutorial that tells us more about HBase and how to work on
it ???

Thanks,
Praveenesh


Re: java.net.ConnectException

2011-04-18 Thread praveenesh kumar
Hi,

Have you checked the ports on which map-reduce server and hdfs are running.
I guess the plugin gives by default its own ports. you have to replace it
with the ports on which you are running your map reduce and hdfs.
I guess that might help you..!!

Thanks,
Praveenesh
On Mon, Apr 18, 2011 at 4:44 PM, RAGHAVENDRA PRASAD 
raghav.npra...@gmail.com wrote:

 I am a newbie into hadoop. We are trying to set up hadoop infrastructure at
 our company. I am not sure whether it is a right forum to ask this
 question.
 Our Application server is windows. I was looking for tutorial by which we
 can connect from windows system to hadoop(on ubuntu)  and have to run MR
 jobs. I downloaded Eclipse plugin(Runs on Windows Server) and i gave ip
 address for Host(hadoop location).When i clicked on finish, i got an error
 -  Failed on Connection Exception java.net.ConnectException. Please let me
 know on how to proceed or any tutorial would be helpful.


 Regards,
 Raghavendra Prasad



Hadoop Speed Efficiency ??

2011-04-18 Thread praveenesh kumar
Hello everyone,

I am new to hadoop...
I set up a  hadoop cluster of 4 ubuntu systems. ( Hadoop 0.20.2)
and I am running the well known word count (gutenberg) example to test how
fast my hadoop is working..

But whenever I am running wordcount example..I am not able to see any much
processing time difference..
On single node the wordcount is taking the same time.. and on cluster of 4
systems also it is taking almost the same time..

Am I  doing anything wrong here ??
Can anyone explain me why its happening.. and how can I make maximum use of
my cluster ??

Thanks.
Praveenesh


Error : Too many fetch-failures

2011-04-14 Thread praveenesh kumar
Hello,

I am new to hadoop.
I am using hadoop 0.20.2 on ubuntu.

I recently installed and configured hadoop using the available tutorials on
internet.
My hadoop is running properly.

But Whenever I am trying to run a wordcount example, the wordcount program
got stuck at the reduce part. After long time , I am getting the following
error..

hadoop@50:/usr/local/hadoop/hadoop$ hadoop jar hadoop-0.20.2-examples.jar
wordcount gutenberg gutenberg-output
11/04/14 23:24:20 INFO input.FileInputFormat: Total input paths to process :
3
11/04/14 23:24:25 INFO mapred.JobClient: Running job: job_201104142306_0001
11/04/14 23:24:26 INFO mapred.JobClient:  map 0% reduce 0%
11/04/14 23:24:45 INFO mapred.JobClient:  map 66% reduce 0%
11/04/14 23:24:54 INFO mapred.JobClient:  map 100% reduce 0%
11/04/14 23:32:50 INFO mapred.JobClient: Task Id :
attempt_201104142306_0001_m_00_0, Status : FAILED
Too many fetch-failures
11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid
argument or cannot assign requested address
11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid
argument or cannot assign requested address
11/04/14 23:32:54 INFO mapred.JobClient:  map 66% reduce 0%
11/04/14 23:33:00 INFO mapred.JobClient:  map 100% reduce 0%

Can somebody help me to solve this issue. Its urgent.. I wasted my whole day
in figuring out the problem.

Thanks,
Praveenesh


Re: Error : Too many fetch-failures

2011-04-14 Thread praveenesh kumar
Hi,
From where I can see the logs ?
I have done single node cluster installaiton and I am running  hadoop on
single machine only. Both Map and Reduce are running on same machine.
Thanks,
Praveenesh
On Thu, Apr 14, 2011 at 4:43 PM, Harsh J ha...@cloudera.com wrote:

 Hello Praveenesh,

 On Thu, Apr 14, 2011 at 3:42 PM, praveenesh kumar praveen...@gmail.com
 wrote:
  attempt_201104142306_0001_m_00_0, Status : FAILED
  Too many fetch-failures
  11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid
  argument or cannot assign requested address
  11/04/14 23:32:50 WARN mapred.JobClient: Error reading task outputInvalid
  argument or cannot assign requested address

 In most cases, this is a simple DNS/hostnames configuration issue. The
 machine on which your Reducer is trying to run on, could be unable to
 contact the HTTP service of the TaskTracker the mappers ran on (one or
 many) due to problems on either end. If this is a single
 pseudo-distributed setup, you may want to verify the contents of your
 /etc/hosts file. A full log output pasted somewhere would also be
 helpful in determining the exact cause.

 --
 Harsh J



<    1   2