from:"Adarsh Sharma"

Re: Tasktracker fails

2012-02-22 Thread Adarsh Sharma


Any update on the below issue.

Thanks

Adarsh Sharma wrote:

Dear all,

Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
When I start my cluster , all daemons got started except tasktracker, 
don't know why task tracker fails due to following error logs.


Cluster is in private network.My /etc/hosts file contains all IP 
hostname resolution commands in all  nodes.


2012-02-21 17:48:33,056 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
TaskTrackerMetrics registered.
2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because java.net.SocketException: Invalid 
argument

   at sun.nio.ch.Net.bind(Native Method)
   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
   at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)

   at org.apache.hadoop.ipc.Server.bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301)
   at org.apache.hadoop.ipc.Server.init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
   at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772)
   at 
org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1428)
   at 
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673)


Any comments on the issue.


Thanks

Tasktracker fails

2012-02-21 Thread Adarsh Sharma


Dear all,

Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
When I start my cluster , all daemons got started except tasktracker, 
don't know why task tracker fails due to following error logs.


Cluster is in private network.My /etc/hosts file contains all IP 
hostname resolution commands in all  nodes.


2012-02-21 17:48:33,056 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
TaskTrackerMetrics registered.
2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: Can 
not start task tracker because java.net.SocketException: Invalid argument

   at sun.nio.ch.Net.bind(Native Method)
   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)

   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
   at org.apache.hadoop.ipc.Server.bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301)
   at org.apache.hadoop.ipc.Server.init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
   at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772)
   at 
org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1428)

   at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673)

Any comments on the issue.


Thanks

Re: HTTP Error

2011-07-14 Thread Adarsh Sharma


Thanks Devraj,

I am using Hadoop-0.20.2 version. In the starting days, cluster is 
working properly. I am able to see all the web UI through web browser,

But Suddenly one day this problem arises.

How to compiled jsp files into my classpath.


Thanks


Devaraj K wrote:

Hi Adarsh,

Which version of hadoop are you using? 


If you are using 0.21 and later versions, need to set the environment
variables HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPREDUCE_HOME
correctly. Otherwise this problem comes.
  




If you are using 0.20.* version, this problem comes when the compiled jsp
files are not coming into the java classpath.



Devaraj K 


-Original Message-
From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] 
Sent: Thursday, July 14, 2011 6:32 PM

To: common-user@hadoop.apache.org
Subject: Re: HTTP Error

Any update on the HTTP Error : Still the issue remains but Hadoop is 
functioning properly.



Thanks


Adarsh Sharma wrote:
  
Thanks Joey I solved the problem of Safe mode by manually deleting 
some files ,


bin/hadoop dfsadmin -report   , shows the all 2 nodes and safe mode 
gets OFF after some time. But,


but I have no guess to solve the below error :

WHy my web UI shows :



 HTTP ERROR: 404

/dfshealth.jsp

RequestURI=/dfshealth.jsp

/Powered by Jetty:// http://jetty.mortbay.org/
/
  


Any views on it. Please help

Thanks




Joey Echeverria wrote:

It looks like both datanodes are trying to serve data out of the smae 
directory. Is there any chance that both datanodes are using the same 
NFS mount for the dfs.data.dir?


If not, what I would do is delete the data from ${dfs.data.dir} and 
then re-format the namenode. You'll lose all of your data, hopefully 
that's not a problem at this time.

-Joey


On Jul 8, 2011, at 0:40, Adarsh Sharma adarsh.sha...@orkash.com wrote:

 
  

Thanks , Still don't understand the issue.

My name node has repeatedly show these logs :

2011-07-08 09:36:31,365 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
ugi=hadoop,hadoopip=/MAster-IP   cmd=listStatus
src=/home/hadoop/systemdst=nullperm=null
2011-07-08 09:36:31,367 INFO org.apache.hadoop.ipc.Server: IPC 
Server handler 2 on 9000, call delete(/home/hadoop/system, true) 
from Master-IP:53593: error: 
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot 
delete /home/hadoop/system. Name node is in safe mode.
The ratio of reported blocks 0.8293 has not reached the threshold 
0.9990. Safe mode will be turned off automatically.
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot 
delete /home/hadoop/system. Name node is in safe mode.
The ratio of reported blocks 0.8293 has not reached the threshold 
0.9990. Safe mode will be turned off automatically.
  at 



org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesys
tem.java:1700) 
  
  at 



org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java
:1680) 
  
  at 


org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517) 
  

  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 



sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
) 
  
  at 



sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25) 
  

  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


And one of my data node shows the below logs :

2011-07-08 09:49:56,967 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand 
action: DNA_REGISTER
2011-07-08 09:49:59,962 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is 
shutting down: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data 
node 192.168.0.209:50010 is attempting to report storage ID 
DS-218695497-SLave_IP-50010-1303978807280. Node SLave_IP:50010 is 
expected to serve this storage.
  at 



org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem
.java:3920) 
  
  at 



org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesyst
em.java:2891) 
  
  at 



org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:71
5) 
  

  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 



sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
) 
  
  at 



sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25

Re: Which release to use?

2011-07-14 Thread Adarsh Sharma

Hadoop releases are issued time by time. But one more thing related to 
hadoop usage,


There are so many providers that provides the distribution of Hadoop ;

1. Apache Hadoop
2. Cloudera
3. Yahoo

etc.
Which distribution is best among them on production usage.
I think Cloudera's  is best among them.


Best Regards,
Adarsh
Owen O'Malley wrote:

On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

  

I'm a newbie and I am confused by the Hadoop releases.
I thought 0.21.0 is the latest  greatest release that I
should be using but I noticed 0.20.203 has been released
lately, and 0.21.X is marked unstable, unsupported.

Should I be using 0.20.203?



Yes, I apologize for confusing release numbering, but the best release to use 
is 0.20.203.0. It includes security, job limits, and many other improvements 
over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so 
it isn't suitable for using with HBase. Most large clusters use a separate 
version of HDFS for HBase.

-- Owen

Re: RDBMS's support for Hadoop

2011-07-13 Thread Adarsh Sharma

There are several other ways though which you can integrate other Column 
Oriented Databases ( Hbase or Cassandra ) with Hive to provide real time 
processing.


Thanks , Adarsh

Amareshwari Sri Ramadasu wrote:

Hi,

None of RDBMS support Hadoop architecture. But you can have look at 
Hive(hive.apache.org), a data warehouse system for Hadoop that facilitates easy 
data summarization, ad-hoc queries, and the analysis of large datasets stored 
in Hadoop compatible file systems.

Thanks
Amareshwari

On 7/13/11 11:34 AM, Raja Nagendra Kumar nagendra.r...@tejasoft.com wrote:



Hi,

Are there any plans to support Hadoop architectures natively by any of the
existing RDBMS databases such as MySQL etc.

My understanding is Hadoop can be the way to store and compute, which are
very much applicable to RDBMS sql egnine.

If this is done well, DB's can perform better, scale to comodity hardware
and users need not think of going with Hadoop way where RDMS kind of
structured data store is the need.

Regards,
Raja Nagendra Kumar,
C.T.O
www.tejasoft.com
-Hadoop Adoption India Consulting






--
View this message in context: 
http://old.nabble.com/RDBMS%27s-support-for-Hadoop-tp32051134p32051134.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

HTTP Error

2011-07-07 Thread Adarsh Sharma


Dear all,

Today I am stucked with the strange problem in the running hadoop cluster.

After starting hadoop by bin/start-all.sh, all nodes are started. But 
when I check through web UI ( MAster-Ip:50070), It shows :



   HTTP ERROR: 404

/dfshealth.jsp

RequestURI=/dfshealth.jsp

/Powered by Jetty:// http://jetty.mortbay.org/
/

/I check by command line that hadoop cannot able to get out of safe mode.
/

/I know , manually command to leave safe mode
/

/bin/hadoop dfsadmin -safemode leave
/

/But How can I make hadoop  run properly and what are the reasons of  
this error

/

/
Thanks
/

Re: HTTP Error

2011-07-07 Thread Adarsh Sharma

)
   at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)

   at java.lang.Thread.run(Thread.java:619)

2011-07-08 09:50:00,394 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
2011-07-08 09:50:01,079 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
2011-07-08 09:50:01,183 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(192.168.0.209:50010, 
storageID=DS-218695497-192.168.0.209-50010-1303978807280, 
infoPort=50075, ipcPort=50020):Finishing DataNode in: 
FSDataset{dirpath='/hdd1-1/data/current'}
2011-07-08 09:50:01,183 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 50020
2011-07-08 09:50:01,183 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
2011-07-08 09:50:01,185 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/
SHUTDOWN_MSG: Shutting down DataNode at ws14-suru-lin/

Also my dfsdmin report shows :

bash-3.2$ bin/hadoop dfsadmin -report
Safe mode is ON
Configured Capacity: 59069984768 (55.01 GB)
Present Capacity: 46471880704 (43.28 GB)
DFS Remaining: 45169745920 (42.07 GB)
DFS Used: 1302134784 (1.21 GB)
DFS Used%: 2.8%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 1 (1 total, 0 dead)

Name: IP:50010
Decommission Status : Normal
Configured Capacity: 59069984768 (55.01 GB)
DFS Used: 1302134784 (1.21 GB)
Non DFS Used: 12598104064 (11.73 GB)
DFS Remaining: 45169745920(42.07 GB)
DFS Used%: 2.2%
DFS Remaining%: 76.47%
Last contact: Fri Jul 08 10:03:40 IST 2011

But I have 2 datanodes.Safe mode is on from the last 1 hour. I know the 
command to leave it manually.
I think the problem arises due to non start up of one of my datanodes. 
How could i solve this problem .


Also for

 HTTP ERROR: 404

/dfshealth.jsp

RequestURI=/dfshealth.jsp

/Powered by Jetty:// http://jetty.mortbay.org/ error,

I manually check through below command at all nodes 


On Master :

ash-3.2$ /usr/java/jdk1.6.0_18/bin/jps 
7548 SecondaryNameNode

7395 NameNode
7628 JobTracker
7713 Jps

And also on slaves :

[root@ws33-shiv-lin ~]# /usr/java/jdk1.6.0_20/bin/jps 
5696 DataNode

5941 Jps
5818 TaskTracker




Thanks



jeff.schm...@shell.com wrote:

Adarsh,

You could also run from command line

[root@xxx bin]# ./hadoop dfsadmin -report
Configured Capacity: 1151948095488 (1.05 TB)
Present Capacity: 1059350446080 (986.6 GB)
DFS Remaining: 1056175992832 (983.64 GB)
DFS Used: 3174453248 (2.96 GB)
DFS Used%: 0.3%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 5 (5 total, 0 dead)




-Original Message-
From: dhru...@gmail.com [mailto:dhru...@gmail.com] On Behalf Of Dhruv
Kumar
Sent: Thursday, July 07, 2011 10:01 AM
To: common-user@hadoop.apache.org
Subject: Re: HTTP Error

1) Check with jps to see if all services are functioning.

2) Have you tried appending dfshealth.jsp at the end of the URL as the
404
says?

Try using this:
http://localhost:50070/dfshealth.jsp



On Thu, Jul 7, 2011 at 7:13 AM, Adarsh Sharma
adarsh.sha...@orkash.comwrote:

  

Dear all,

Today I am stucked with the strange problem in the running hadoop


cluster.
  

After starting hadoop by bin/start-all.sh, all nodes are started. But


when
  

I check through web UI ( MAster-Ip:50070), It shows :


  HTTP ERROR: 404

/dfshealth.jsp

RequestURI=/dfshealth.jsp

/Powered by Jetty:// http://jetty.mortbay.org/
/

/I check by command line that hadoop cannot able to get out of safe


mode.
  

/

/I know , manually command to leave safe mode
/

/bin/hadoop dfsadmin -safemode leave
/

/But How can I make hadoop  run properly and what are the reasons of


this
  

error
/

/
Thanks
/

Running Back to Back Map-reduce jobs

2011-06-02 Thread Adarsh Sharma


Dear all,

I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

Now this time I want a map-reduce job to be run again after one.

Fore.g to clear my point, suppose a wordcount is run on gutenberg file 
in HDFS and after completion


11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to 
process : 3

11/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
11/06/02 15:15:25 INFO mapred.JobClient: Job complete: job_201106021143_0030
11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



Again a map-reduce job is started on the output or original data say again

1/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%

Is it possible or any parameters to achieve it.

Please guide .

Thanks

Re: Running Back to Back Map-reduce jobs

2011-06-02 Thread Adarsh Sharma


Ok, Is it valid for running jobs through Hadoop Pipes too.

Thanks

Harsh J wrote:

Oozie's workflow feature may exactly be what you're looking for. It
can also do much more than just chain jobs.

Check out additional features at: http://yahoo.github.com/oozie/

On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Dear all,

I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

Now this time I want a map-reduce job to be run again after one.

Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
HDFS and after completion

11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
11/06/02 15:15:25 INFO mapred.JobClient: Job complete: job_201106021143_0030
11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



Again a map-reduce job is started on the output or original data say again

1/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%

Is it possible or any parameters to achieve it.

Please guide .

Thanks

How to run multiple mapreduce in hadoop pipes?

2011-06-02 Thread Adarsh Sharma


Dear all,

How to run two map reduce one after the another in a single  hadoop 
map-reduce code.


How to run two map reduce simultaneously in a single  hadoop program(not 
cascading but two different map reduce sequentially means as first map 
function stop the next map function start similarly when first reduce 
function stop the second reduce function start )


What is the default  job  configuration file for hadoop pipes job?


How to cascade map reduce in hadoop pipes.

What is the default  job  configuration file for hadoop pipes job?

How to set job conf in hadoop pipes ?

Thanks

Re: Running Back to Back Map-reduce jobs

2011-06-02 Thread Adarsh Sharma


Thanks a lot, I will let you know after some work on it.

:-)

Harsh J wrote:

Yes, I believe Oozie does have Pipes and Streaming action helpers as well.

On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Ok, Is it valid for running jobs through Hadoop Pipes too.

Thanks

Harsh J wrote:


Oozie's workflow feature may exactly be what you're looking for. It
can also do much more than just chain jobs.

Check out additional features at: http://yahoo.github.com/oozie/

On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma adarsh.sha...@orkash.com
wrote:

  

Dear all,

I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

Now this time I want a map-reduce job to be run again after one.

Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
HDFS and after completion

11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User
classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
process
: 3
11/06/02 15:14:36 INFO mapred.JobClient: Running job:
job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
job_201106021143_0030
11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



Again a map-reduce job is started on the output or original data say
again

1/06/02 15:14:36 INFO mapred.JobClient: Running job:
job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%

Is it possible or any parameters to achieve it.

Please guide .

Thanks

Re: Namenode error :- FSNamesystem initialization failed

2011-04-28 Thread Adarsh Sharma


Thanks harsh,

My dfs.name.dir is /home/hadoop/project/hadoop-0.20.2/name
 there is only one file fsimage inimage directory  current directory 
is empty.


But there must be four files fsname, edits, image etc in current 
directory. How they got deleted even I don't issue format command yet.


I think there must be a remedy to get the previous data.




Harsh J wrote:

Hello Adarsh,

On Thu, Apr 28, 2011 at 11:02 AM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
  

After correcting my mistake , when I try to run with the hadoop user, my
Namenode fails with the below exception\ :
2011-04-28 10:53:49,608 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: NameNode is not formatted.
  at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)



Do you still have the valid contents in your good ${dfs.name.dir}? You
should be able to recover with that.

Re: Namenode error :- FSNamesystem initialization failed

2011-04-28 Thread Adarsh Sharma

I correct my posting that this problem arises due to running a script 
that internally issue below commands as root user:


 bin/hadoop namenode -format
bin/start-all.sh

No is it possible to start the previous cluster with the previous data 
or not.


Thanks


Adarsh Sharma wrote:

Thanks harsh,

My dfs.name.dir is /home/hadoop/project/hadoop-0.20.2/name
 there is only one file fsimage inimage directory  current directory 
is empty.


But there must be four files fsname, edits, image etc in current 
directory. How they got deleted even I don't issue format command yet.


I think there must be a remedy to get the previous data.




Harsh J wrote:

Hello Adarsh,

On Thu, Apr 28, 2011 at 11:02 AM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
 
After correcting my mistake , when I try to run with the hadoop 
user, my

Namenode fails with the below exception\ :
2011-04-28 10:53:49,608 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: NameNode is not formatted.
  at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317) 




Do you still have the valid contents in your good ${dfs.name.dir}? You
should be able to recover with that.

Re: Namenode error :- FSNamesystem initialization failed

2011-04-28 Thread Adarsh Sharma

I start the cluster by formatting the namenode after deleting all 
folders (check,name,data,mapred)  previous data is lost. It's urgent


Now for my knowledge base, Don't we start our namenode from our SNN 
check directory if by mistake we format the namenode.


Some latest data might be missing.

If yes, what are the steps to do this.



Thanks


Harsh J wrote:

I believe the format must have killed it all away. You might be lucky
with the SNN location if you were running it, though - could you check
in the SNN's directory for valid fsimage?

On Thu, Apr 28, 2011 at 12:53 PM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
  

I correct my posting that this problem arises due to running a script that
internally issue below commands as root user:



bin/hadoop namenode -format
bin/start-all.sh
  

No is it possible to start the previous cluster with the previous data or
not.

Thanks


Adarsh Sharma wrote:


Thanks harsh,

My dfs.name.dir is /home/hadoop/project/hadoop-0.20.2/name
 there is only one file fsimage inimage directory  current directory is
empty.

But there must be four files fsname, edits, image etc in current
directory. How they got deleted even I don't issue format command yet.

I think there must be a remedy to get the previous data.




Harsh J wrote:
  

Hello Adarsh,

On Thu, Apr 28, 2011 at 11:02 AM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:



After correcting my mistake , when I try to run with the hadoop user, my
Namenode fails with the below exception\ :
2011-04-28 10:53:49,608 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: NameNode is not formatted.
 at

org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)

  

Do you still have the valid contents in your good ${dfs.name.dir}? You
should be able to recover with that.

Re: Running C hdfs Code in Hadoop

2011-04-28 Thread Adarsh Sharma


Thanks Brain, It works ..
Looking Forward in future problems

Thanks Once again

Brian Bockelman wrote:

Hi Adarsh,

It appears you don't have the JVM libraries in your LD_LIBRARY_PATH.  Try this:

export 
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server

Brian

On Apr 27, 2011, at 11:31 PM, Adarsh Sharma wrote:

  

Dear all,

Today I am trying to run a simple code by following the below tutorial :-


http://hadoop.apache.org/hdfs/docs/current/libhdfs.html

I followed the below steps :-

1. Set LD_LIBRARY_PATH  CLASSPATH as :

export 
LD_LIBRARY_PATH=/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib:/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so
export 
CLASSPATH=$CLASSPATH:$HADOOP_HOME:$HADOOP_HOME/lib:/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib:/usr/java/jdk1.6.0_18/jre/lib/amd64

2. write above_sample.c program  put it into $HADOOP_HOME/src/c++/libhdfs 
directory

3. After compiling with the below command I am facing issues as :-


gcc above_sample.c -I/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L$HADOOP_HOME/c++/Linux-amd64-64/lib/libhdfs.so.0 -lhdfs -I$HADOOP_HOME 
-I$HADOOP_HOME/lib /usr/java/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib/libhdfs.so.0 -o 
above_sample

bash-3.2$ ./above_sample
Error occurred during initialization of VM
Unable to load native library: 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/libjava.so: cannot open 
shared object file: No such file or directory


Now when I try the below command :

gcc above_sample.c -I/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L$HADOOP_HOME/c++/Linux-amd64-64/lib/libhdfs.so.0 -lhdfs -I$HADOOP_HOME 
-I$HADOOP_HOME/lib /usr/java/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib/libhdfs.so.0 
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so -o above_sample
/usr/bin/ld: warning: libverify.so, needed by 
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so, not found (try using -rpath or 
-rpath-link)
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClassname@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClassForMajorVersion@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyFixClassname@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClass@SUNWprivate_1.1'
collect2: ld returned 1 exit status
bash-3.2$

Can Someone guide me the steps needed to run a libhdfs simple program in Hadoop 
Cluster

Thanks  best Regards
Adarsh Sharma

Running C hdfs Code in Hadoop

2011-04-27 Thread Adarsh Sharma


Dear all,

Today I am trying to run a simple code by following the below tutorial :-


http://hadoop.apache.org/hdfs/docs/current/libhdfs.html

I followed the below steps :-

1. Set LD_LIBRARY_PATH  CLASSPATH as :

export 
LD_LIBRARY_PATH=/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib:/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so
export 
CLASSPATH=$CLASSPATH:$HADOOP_HOME:$HADOOP_HOME/lib:/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib:/usr/java/jdk1.6.0_18/jre/lib/amd64


2. write above_sample.c program  put it into 
$HADOOP_HOME/src/c++/libhdfs directory


3. After compiling with the below command I am facing issues as :-


gcc above_sample.c -I/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L$HADOOP_HOME/c++/Linux-amd64-64/lib/libhdfs.so.0 -lhdfs -I$HADOOP_HOME 
-I$HADOOP_HOME/lib /usr/java/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib/libhdfs.so.0 
-o above_sample


bash-3.2$ ./above_sample
Error occurred during initialization of VM
Unable to load native library: 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/libjava.so: cannot 
open shared object file: No such file or directory



Now when I try the below command :

gcc above_sample.c -I/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L$HADOOP_HOME/c++/Linux-amd64-64/lib/libhdfs.so.0 -lhdfs -I$HADOOP_HOME 
-I$HADOOP_HOME/lib /usr/java/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so 
/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib/libhdfs.so.0 
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so -o above_sample
/usr/bin/ld: warning: libverify.so, needed by 
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so, not found (try using 
-rpath or -rpath-link)
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClassname@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClassForMajorVersion@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyFixClassname@SUNWprivate_1.1'
/usr/java/jdk1.6.0_18/jre/lib/amd64/libjava.so: undefined reference to 
`VerifyClass@SUNWprivate_1.1'

collect2: ld returned 1 exit status
bash-3.2$

Can Someone guide me the steps needed to run a libhdfs simple program in 
Hadoop Cluster


Thanks  best Regards
Adarsh Sharma

Namenode error :- FSNamesystem initialization failed

2011-04-27 Thread Adarsh Sharma


Dear all,

I have a running 4-node Hadoop cluster and some data stored in HDFS.

Today by mistake,I start the hadoop cluster with root user.

rootbin/start-all.sh

After correcting my mistake , when I try to run with the hadoop user, my 
Namenode fails with the below exception\ :



STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

/
2011-04-28 10:53:49,516 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=54310
2011-04-28 10:53:49,520 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 
ws-test/192.168.0.207:54310
2011-04-28 10:53:49,523 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-04-28 10:53:49,527 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.ganglia.GangliaContext
2011-04-28 10:53:49,573 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-04-28 10:53:49,573 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-04-28 10:53:49,573 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
isPermissionEnabled=false
2011-04-28 10:53:49,580 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.ganglia.GangliaContext
2011-04-28 10:53:49,581 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2011-04-28 10:53:49,608 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
initialization failed.

java.io.IOException: NameNode is not formatted.
   at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)
   at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2011-04-28 10:53:49,609 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 54310
2011-04-28 10:53:49,609 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: 
NameNode is not formatted.
   at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)
   at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


2011-04-28 10:53:49,610 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

/

I know if i delete the data  mapred directory of all node  issue the 
bin/hadoop namenode -format command , my cluster starts again but 
previous data lost.


Is it possible to start cluster without the data loss. If yes, Please 
let me know




Thanks

Hadoop-Eclipse

2011-04-21 Thread Adarsh Sharma


Dear all,


I followed a link of a blog to configure Eclipse for running Map-reduce 
jobs.


http://www.harshj.com/2010/07/18/making-the-eclipse-plugin-work-for-hadoop/

I am facing the same issue mentioned in below link :-

https://issues.apache.org/jira/browse/MAPREDUCE-1280


My Hadoop Version is hadoop-0.20.2, java version :- 1.6.0.18, eclipse 
version is  3.5.1 .


I used Gailelio  Helios both .


After following so simple post and creating 
hadoop-0.20.3-dev-eclipse-plugin.jar , when I restart eclipse Map-reduce 
Perspective is lost from Eclipse.And all hadoop related things disappear.



Please suggest me how to solve this problem.


Thanks  best Regards,
Adarsh Sharma

Re: Hadoop-Eclipse

2011-04-21 Thread Adarsh Sharma


Harsh J wrote:

Could you ensure you do not have multiple versions of the plugin installed?
  


How can I verify this.


You can also try the compiled plugin attached on the ticket:
https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar
  

In the beginning I try this jar but this also results in same problem.
Which hadoop-eclipse plugin runs with Hadoop-0.20.2 version

What are steps needed to remove  install new plugin in Eclipse.



Thanks , Adarsh



On Thu, Apr 21, 2011 at 5:12 PM, Adarsh Sharma 
adarsh.sha...@orkash.com wrote:

Dear all,


I followed a link of a blog to configure Eclipse for running Map-reduce
jobs.

http://www.harshj.com/2010/07/18/making-the-eclipse-plugin-work-for-hadoop/

I am facing the same issue mentioned in below link :-

https://issues.apache.org/jira/browse/MAPREDUCE-1280


My Hadoop Version is hadoop-0.20.2, java version :- 1.6.0.18, eclipse
version is  3.5.1 .

I used Gailelio  Helios both .


After following so simple post and creating
hadoop-0.20.3-dev-eclipse-plugin.jar , when I restart eclipse Map-reduce
Perspective is lost from Eclipse.And all hadoop related things disappear.


Please suggest me how to solve this problem.


Thanks  best Regards,
Adarsh Sharma

Re: Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-08 Thread Adarsh Sharma


Thanks Sagar,

But is the process is same for Eclipse Helios that is used for running 
C++ Code.


I am not able to locate Map-Reduce Perspective in it.

Best Regards
Adarsh

Sagar Kohli wrote:

Hi Adarsh,

Try this link
http://shuyo.wordpress.com/2011/03/08/hadoop-development-environment-with-eclipse/

regards
Sagar

From: Adarsh Sharma [adarsh.sha...@orkash.com]
Sent: Friday, April 08, 2011 9:45 AM
To: common-user@hadoop.apache.org
Subject: Configuring Hadoop With Eclipse Environment for C++ CDT Code

Dear all,

I am following the below links to configure Eclipse with hadoop
Environment But don't able to find the Map-Reduce Perspective in Open
Perspective  Other Option.

http://developer.yahoo.com/hadoop/tutorial/module3.html#eclipse

http://wiki.apache.org/hadoop/EclipseEnvironment

I copied hadoop-eclipse plugin.jar in plug-ins sub-directory of eclipse.
But don't know why there is no Map-reduce Option in New Project Option.

Please let me know if there is any other useful link of doing this.



Thanks  best regards,
Adarsh Sharma



Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
Planning your Hadoop/ NoSQL projects for 2011 at 
www.impetus.com/featured_webinar?eventid=37

Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Re: Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-08 Thread Adarsh Sharma


Thanks sagar,

But is the process is same for Eclipse Helios that is used for running 
C++ code.


I am not able to locate Map-reduce Perspective in it.

Best Regards,
Adarsh SHarma

Sagar Kohli wrote:


Hi Adarsh,

Try this link
http://shuyo.wordpress.com/2011/03/08/hadoop-development-environment-with-eclipse/

regards
Sagar

From: Adarsh Sharma [adarsh.sha...@orkash.com]
Sent: Friday, April 08, 2011 9:45 AM
To: common-user@hadoop.apache.org
Subject: Configuring Hadoop With Eclipse Environment for C++ CDT Code

Dear all,

I am following the below links to configure Eclipse with hadoop
Environment But don't able to find the Map-Reduce Perspective in Open
Perspective  Other Option.

http://developer.yahoo.com/hadoop/tutorial/module3.html#eclipse

http://wiki.apache.org/hadoop/EclipseEnvironment

I copied hadoop-eclipse plugin.jar in plug-ins sub-directory of eclipse.
But don't know why there is no Map-reduce Option in New Project Option.

Please let me know if there is any other useful link of doing this.



Thanks  best regards,
Adarsh Sharma



Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
Planning your Hadoop/ NoSQL projects for 2011 at 
www.impetus.com/featured_webinar?eventid=37

Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-07 Thread Adarsh Sharma


Dear all,

I am following the below links to configure Eclipse with hadoop 
Environment But don't able to find the Map-Reduce Perspective in Open 
Perspective  Other Option.


http://developer.yahoo.com/hadoop/tutorial/module3.html#eclipse

http://wiki.apache.org/hadoop/EclipseEnvironment

I copied hadoop-eclipse plugin.jar in plug-ins sub-directory of eclipse. 
But don't know why there is no Map-reduce Option in New Project Option.


Please let me know if there is any other useful link of doing this.



Thanks  best regards,
Adarsh Sharma

Running wordcount-nopipe.cc in Hadoop

2011-04-05 Thread Adarsh Sharma


Dear all,

I am sorry for posting second time but I found it very difficult to run 
wordcount-nopipe.cc program in* 
/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl *directory in 
a running Hadoop Cluster.


In the beginning I faced the below exception

bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe


11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
   at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114) 



attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_01_0, Status : FAILED
java.io.IOException: pipe child exception
   at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException

but after some RD  and executing the below command I am facing the 
below exception :


bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -libjars 
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-test.jar -inputformat 
org.apache.hadoop.mapred.pipes.WordCountInputFormat -input gutenberg 
-output gutenberg-out101  -program bin/wordcount-nopipe




11/03/31 16:36:26 WARN mapred.JobClient: No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Exception in thread main java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ws-test:54310/user/hadoop/gutenberg, expected: file:

  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
  at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) 

  at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:273) 


  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:721)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
  at 
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:465) 

  at 
org.apache.hadoop.mapred.pipes.WordCountInputFormat.getSplits(WordCountInputFormat.java:57) 

  at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
  at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
  at 
org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)

  at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
  at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)


Please Someone who ran it before or any idea about this error, guide me .


Best regards,  Adarsh

Re: Hadoop Pipes Error

2011-04-03 Thread Adarsh Sharma


Hi, Good Morning to all of you..

Any update on the below problem.


Thanks  best Regards,
Adarsh Sharma

Amareshwari Sri Ramadasu wrote:
You can not run it with TextInputFormat. You should run it with 
org.apache.hadoop.mapred.pipes .*WordCountInputFormat. *You can pass 
the input format by passing it in --inputformat option.

I did not try it myself, but it should work.

-Amareshwari

On 3/31/11 12:23 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Thanks Amareshwari,

here is the posting :
The *nopipe* example needs more documentation.  It assumes that it
is  
run with the InputFormat from

src/test/org/apache/*hadoop*/mapred/*pipes*/
*WordCountInputFormat*.java, which has a very specific input split  
format. By running with a TextInputFormat, it will send binary bytes  
as the input split and won't work right. The *nopipe* example should  
probably be recoded *to* use libhdfs *too*, but that is more
complicated  
*to* get running as a unit test. Also note that since the C++
example  
is using local file reads, it will only work on a cluster if you
have  
nfs or something working across the cluster.


Please need if I'm wrong.

I need to run it with TextInputFormat.

If posiible Please explain the above post more clearly.


Thanks  best Regards,
Adarsh Sharma



Amareshwari Sri Ramadasu wrote:


Here is an answer for your question in old mail archive:
http://lucene.472066.n3.nabble.com/pipe-application-error-td650185.html


Don't understand what is the reason  solution of this.



On 3/31/11 10:15 AM, Adarsh Sharma
adarsh.sha...@orkash.com mailto:adarsh.sha...@orkash.com
 wrote:

Any update on the below error.

Please guide.


Thanks  best Regards,
Adarsh Sharma



Adarsh Sharma wrote:
  
 



Dear all,

Today I faced a problem while running a map-reduce job in
C++. I am
not able to understand to find the reason of the below error :


11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
at

org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at

org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at

org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_00_0: Hadoop Pipes
Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_01_0, Status : FAILED
java.io.IOException: pipe child exception
at

org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at

org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at

org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_01_0: Hadoop Pipes
Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_02_0, Status : FAILED
java.io.IOException: pipe child exception

Running wordcount-nopipe program

2011-04-03 Thread Adarsh Sharma

(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.net.SocketException: Connection reset
   at java.net.SocketInputStream.read(SocketInputStream.java:168)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readByte(DataInputStream.java:248)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201104041057_0002_m_07_1: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/04/04 11:00:01 INFO mapred.JobClient: Task Id : 
attempt_201104041057_0002_m_06_1, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201104041057_0002_m_06_1: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)

11/04/04 11:00:01 INFO mapred.JobClient: Job complete: job_201104041057_0002
11/04/04 11:00:01 INFO mapred.JobClient: Counters: 4
11/04/04 11:00:01 INFO mapred.JobClient:   Job Counters
11/04/04 11:00:01 INFO mapred.JobClient: Rack-local map tasks=8
11/04/04 11:00:01 INFO mapred.JobClient: Launched map tasks=28
11/04/04 11:00:01 INFO mapred.JobClient: Data-local map tasks=20
11/04/04 11:00:01 INFO mapred.JobClient: Failed map tasks=1
Exception in thread main java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
   at 
org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)

   at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
   at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
bash-3.2$


Please  help me to find the reasons  possible solutions to resolve the 
above error.





Thanks  best Regards,
Adarsh Sharma

Re: How to apply Patch

2011-03-31 Thread Adarsh Sharma


Thanks a lot for such deep explanation :

I have done it now, but it doesn't help me in my original problem for 
which I'm doing this.


Please if you have some idea comment on it. I attached the problem.


Thanks  best Regards,
Adarsh Sharma




Matthew Foley wrote:

Hi Adarsh,
see if the information at http://wiki.apache.org/hadoop/HowToContribute is 
helpful to you.  I'll walk you thru the typical process, but first a couple 
questions:

Did you get a tar file for the whole source tree of Hadoop, or only the binary 
distribution?  To apply patches you must get the source tree (usually from the 
Apache svn server), apply the patch, then re-build.

Also, if you're just starting out, you might consider using a newer version 
like 0.22 instead of 0.20.

And finally, is the patch you are looking at intended to work with v0.20.2?  A 
patch targeted for 0.22 is only maybe 50% likely to work as intended with a 
v0.20 version.

So, here's a sketch of how to apply a patch (to v0.20.2) and rebuild:

Per the instructions at http://wiki.apache.org/hadoop/HowToContribute, create a 
directory for your hadoop projects, and from within that directory do:
svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.2/ 
hadoop-0.20.2

In the same projects directory create a patches subdirectory (as a sibling to the 
hadoop-0.20.2 subdirectory) and save the patch into it (perhaps from your browser while 
viewing the Jira)

Now cd into the HADOOP_HOME directory:
cd hadoop-0.20.2
Observe that it has subdirectories like build/, conf/, ivy/, and src/.

Now apply the patch:
patch -p 0  ../patches/fix-test-pipes.patch

Let's take a look at that last line:
patch -p 0  ../patches/fix-test-pipes.patch
    ^^  ^
  | |   |_where you stored the patch file
  | |
  | |_the patch command is funny, you locate yourself at the root of
  |  the source tree, then pipe the patch file contents into the patch
  |  command.  The target source directory name is not an argument.
  |
  |_this -p 0 argument is generally applicable for patches generated the way we 
like to do patches in the hadoop dev community.  Read the unix man page for patch, and 
http://wiki.apache.org/hadoop/HowToContribute if you're interested.

Now, re-build hadoop:
ant veryclean compile bin-package

Of course, that assumes you have installed ant and ivy. Each of those utilities 
have home pages that are very informative, along with the info on the above 
HowToContribute page. You can also look at the docs at 
http://hadoop.apache.org/common/docs/current/

You should now have all the pieces necessary to install and run the patched version of 
Hadoop. (The jars are in the build subdirectory in hadoop-0.20.2) In my 
experience it is always easier to get a vanilla system installed and running, and then 
overlay that installation with the new jars, rather than try to install from your build 
directory, but that's just my personal preference.

If you want to work in v21 or higher, you'll have to deal with three hadoop 
subdirectories instead of one, and use the ant -Dresolvers=internal 
mvn-install invocation to get them to build the patch right across components. But 
in v20 the above should work fine.

If you are going to work with Hadoop long term, you'll want an IDE like 
Eclipse. So web pages like http://wiki.apache.org/hadoop/EclipseEnvironment may 
be helpful.

Hope this helps,
--Matt

[Fwd: Hadoop Pipe Error]

2011-03-31 Thread Adarsh Sharma


Sorry, As usual Please find the attachment here.


Thanks  best Regards,

Adarsh Sharma
---BeginMessage---

Dear all,

Today I faced a problem while running a map-reduce job in C++. I am not 
able to understand to find the reason of the below error :



11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_01_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_02_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_2, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in 
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.



make  wordcount-nopipe
bin/hadoop fs -put wordcount-nopipe   bin/wordcount-nopipe
bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe

 or
bin/hadoop pipes -D hadoop.pipes.java.recordreader=false -D 
hadoop.pipes.java.recordwriter=false -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe


but error remains the same. I attached my Makefile also.
Please have some comments on it.

I am able to wun a simple wordcount.cpp program in Hadoop Cluster but 
don't know why this program fails in Broken Pipe error.




Thanks  best regards
Adarsh Sharma



CC = g++
HADOOP_INSTALL =/home/hadoop/project/hadoop-0.20.2
PLATFORM = Linux-amd64-64 
CPPFLAGS = -m64 -I/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include -I/usr/local/cuda/include


wordcount-nopipe

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma


Thanks Amareshwari,

here is the posting :
The *nopipe* example needs more documentation.  It assumes that it is  
run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
*WordCountInputFormat*.java, which has a very specific input split  
format. By running with a TextInputFormat, it will send binary bytes  
as the input split and won't work right. The *nopipe* example should  
probably be recoded *to* use libhdfs *too*, but that is more complicated  
*to* get running as a unit test. Also note that since the C++ example  
is using local file reads, it will only work on a cluster if you have  
nfs or something working across the cluster.


Please need if I'm wrong.

I need to run it with TextInputFormat.

If posiible Please explain the above post more clearly.


Thanks  best Regards,
Adarsh Sharma



Amareshwari Sri Ramadasu wrote:

Here is an answer for your question in old mail archive:
http://lucene.472066.n3.nabble.com/pipe-application-error-td650185.html

On 3/31/11 10:15 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Any update on the below error.

Please guide.


Thanks  best Regards,
Adarsh Sharma



Adarsh Sharma wrote:
  

Dear all,

Today I faced a problem while running a map-reduce job in C++. I am
not able to understand to find the reason of the below error :


11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_01_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_02_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_2, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.


make  wordcount

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma

What are the steps needed to debug the error  make worcount-nopipe.cc 
running properly.


Please if possible guide in steps.

Thanks  best  Regards,
Adarsh Sharma


Amareshwari Sri Ramadasu wrote:

Here is an answer for your question in old mail archive:
http://lucene.472066.n3.nabble.com/pipe-application-error-td650185.html

On 3/31/11 10:15 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Any update on the below error.

Please guide.


Thanks  best Regards,
Adarsh Sharma



Adarsh Sharma wrote:
  

Dear all,

Today I faced a problem while running a map-reduce job in C++. I am
not able to understand to find the reason of the below error :


11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_01_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)

attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_02_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed
to open  at wordcount-nopipe.cc:82 in
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id :
attempt_201103301130_0011_m_00_2, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.


make  wordcount-nopipe
bin/hadoop fs -put wordcount-nopipe   bin/wordcount-nopipe
bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=true -input gutenberg -output
gutenberg-out11 -program bin/wordcount-nopipe

 or
bin/hadoop pipes -D hadoop.pipes.java.recordreader=false -D
hadoop.pipes.java.recordwriter=false -input gutenberg -output
gutenberg-out11 -program bin/wordcount-nopipe

but error remains the same. I attached my Makefile also.
Please have some comments on it.

I am able to wun a simple wordcount.cpp program in Hadoop Cluster but
don't know why this program fails in Broken Pipe error.



Thanks

Re: How to apply Patch

2011-03-31 Thread Adarsh Sharma


Thanks Steve , U helped me to clear my doubts several times.

I explain U What my Problem is :

I am trying to run *wordcount-nopipe.cc* program in 
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.
I am able to run a simple wordcount.cpp program in Hadoop Cluster but 
whebn I am going to run this program, ifaced below exception :


*bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out1101 -program bin/wordcount-nopipe2*
11/03/31 14:59:07 WARN mapred.JobClient: No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/03/31 14:59:07 INFO mapred.FileInputFormat: Total input paths to 
process : 3

11/03/31 14:59:08 INFO mapred.JobClient: Running job: job_201103310903_0007
11/03/31 14:59:09 INFO mapred.JobClient:  map 0% reduce 0%
11/03/31 14:59:18 INFO mapred.JobClient: Task Id : 
attempt_201103310903_0007_m_00_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.DataOutputStream.flush(DataOutputStream.java:106)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.flush(BinaryProtocol.java:316)
   at 
org.apache.hadoop.mapred.pipes.Application.waitForFinish(Application.java:129)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:99)

   ... 3 more

attempt_201103310903_0007_m_00_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe2.cc:86 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/31 14:59:18 INFO mapred.JobClient: Task Id : 
attempt_201103310903_0007_m_01_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.DataOutputStream.flush(DataOutputStream.java:106)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.flush(BinaryProtocol.java:316)
   at 
org.apache.hadoop.mapred.pipes.Application.waitForFinish(Application.java:129)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:99)

   ... 3 more

After some RD , i find the below links quite useful :

http://lucene.472066.n3.nabble.com/pipe-application-error-td650185.html
http://stackoverflow.com/questions/4395140/eofexception-thrown-by-a-hadoop-pipes-program

But don't know how to resolve this. I think my program try to open the 
file as file://gutenberg but it requires as hdfs://.


Here is the contents of my Makefile :

CC = g++
HADOOP_INSTALL =/home/hadoop/project/hadoop-0.20.2
PLATFORM = Linux-amd64-64
CPPFLAGS = -m64 
-I/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include 
-I/usr/local/cuda/include


wordcount-nopipe2 : wordcount-nopipe2.cc
   $(CC) $(CPPFLAGS) $ -Wall 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L/usr/local/cuda/lib64 -lhadooppipes \

   -lhadooputils -lpthread -g -O2 -o $@

Would it be a bug in hadoop-0.20.2 and if not Please guide me how to 
debug it.




Thanks  best Regards,
Adarsh Sharma












Steve Loughran wrote:

On 31/03/11 07:37, Adarsh Sharma wrote:

Thanks a lot for such deep explanation :

I have done it now, but it doesn't help me in my original problem for
which I'm doing this.

Please if you have some idea comment on it. I attached the problem.



Sadly. Matt's deep explanation is what you need, low-level that it is

-patches are designed to be applied to source, so you need

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma


Amareshwari Sri Ramadasu wrote:
You can not run it with TextInputFormat. You should run it with 
org.apache.hadoop.mapred.pipes .*WordCountInputFormat. *You can pass 
the input format by passing it in --inputformat option.

I did not try it myself, but it should work.




Here is the command that I am trying and it results in exception:

bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true  -inputformat 
org.apache.hadoop.mapred.pipes.WordCountInputFormat -input gutenberg 
-output gutenberg-out101 -program bin/wordcount-nopipe
Exception in thread main java.lang.ClassNotFoundException: 
org.apache.hadoop.mapred.pipes.WordCountInputFormat

   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
   at 
org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:372)

   at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:421)
   at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)


Thanks , Adarsh

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma


Thanks Amareshwari, I find it  I'm sorry it results in another error:

bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -libjars 
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-test.jar -inputformat 
org.apache.hadoop.mapred.pipes.WordCountInputFormat -input gutenberg 
-output gutenberg-out101  -program bin/wordcount-nopipe
11/03/31 16:36:26 WARN mapred.JobClient: No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Exception in thread main java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ws-test:54310/user/hadoop/gutenberg, expected: file:

   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:273)

   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:721)
   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
   at 
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:465)
   at 
org.apache.hadoop.mapred.pipes.WordCountInputFormat.getSplits(WordCountInputFormat.java:57)
   at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
   at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
   at 
org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)

   at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
   at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)

Best regards,  Adarsh


Amareshwari Sri Ramadasu wrote:

Adarsh,

The inputformat is present in test jar. So, pass -libjars full path to 
testjar to your command. libjars option should be passed before program specific 
options. So, it should be just after your -D parameters.

-Amareshwari

On 3/31/11 3:45 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Amareshwari Sri Ramadasu wrote:
Re: Hadoop Pipes Error You can not run it with TextInputFormat. You should run 
it with org.apache.hadoop.mapred.pipes .WordCountInputFormat. You can pass the 
input format by passing it in -inputformat option.
I did not try it myself, but it should work.




Here is the command that I am trying and it results in exception:

bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true  -inputformat 
org.apache.hadoop.mapred.pipes.WordCountInputFormat -input gutenberg -output 
gutenberg-out101 -program bin/wordcount-nopipe
Exception in thread main java.lang.ClassNotFoundException: 
org.apache.hadoop.mapred.pipes.WordCountInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:372)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:421)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)


Thanks , Adarsh

Hadoop Pipes Error

2011-03-30 Thread Adarsh Sharma


Dear all,

Today I faced a problem while running a map-reduce job in C++. I am not 
able to understand to find the reason of the below error :



11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_01_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_02_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_2, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in 
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.



make  wordcount-nopipe
bin/hadoop fs -put wordcount-nopipe   bin/wordcount-nopipe
bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe

 or
bin/hadoop pipes -D hadoop.pipes.java.recordreader=false -D 
hadoop.pipes.java.recordwriter=false -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe


but error remains the same. I attached my Makefile also.
Please have some comments on it.

I am able to wun a simple wordcount.cpp program in Hadoop Cluster but 
don't know why this program fails in Broken Pipe error.




Thanks  best regards
Adarsh Sharma
CC = g++
HADOOP_INSTALL =/home/hadoop/project/hadoop-0.20.2
PLATFORM = Linux-amd64-64 
CPPFLAGS = -m64 -I/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include 
-I/usr/local/cuda/include

wordcount-nopipe : wordcount-nopipe.cc
$(CC) $(CPPFLAGS) $ -Wall 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/lib 
-L/usr/local/cuda/lib64 -lhadooppipes \
-lhadooputils

How to apply Patch

2011-03-30 Thread Adarsh Sharma


Dear all,

Can Someone Please tell me how to apply a patch on hadoop-0.20.2 package.

I attached the patch.

Please find the attachment. I just follow below steps for Hadoop :
1. Download Hadoop-0.20.2.tar.gz
2. Extract the file.
3. Set Configurations in site.xml files

Thanks  best Regards,
Adarsh Sharma

Re: How to apply Patch

2011-03-30 Thread Adarsh Sharma


Sorry, Just check the attachment now.

Adarsh Sharma wrote:

Dear all,

Can Someone Please tell me how to apply a patch on hadoop-0.20.2 package.

I attached the patch.

Please find the attachment. I just follow below steps for Hadoop :
1. Download Hadoop-0.20.2.tar.gz
2. Extract the file.
3. Set Configurations in site.xml files

Thanks  best Regards,
Adarsh Sharma



Index: src/test/org/apache/hadoop/mapred/pipes/TestPipes.java
===
--- src/test/org/apache/hadoop/mapred/pipes/TestPipes.java	(revision 565616)
+++ src/test/org/apache/hadoop/mapred/pipes/TestPipes.java	(working copy)
@@ -150,7 +150,8 @@
 JobConf job = mr.createJobConf();
 job.setInputFormat(WordCountInputFormat.class);
 FileSystem local = FileSystem.getLocal(job);
-Path testDir = new Path(System.getProperty(test.build.data), pipes);
+Path testDir = new Path(file: + System.getProperty(test.build.data), 
+pipes);
 Path inDir = new Path(testDir, input);
 Path outDir = new Path(testDir, output);
 Path wordExec = new Path(/testing/bin/application);
Index: src/test/org/apache/hadoop/mapred/pipes/WordCountInputFormat.java
===
--- src/test/org/apache/hadoop/mapred/pipes/WordCountInputFormat.java	(revision 565616)
+++ src/test/org/apache/hadoop/mapred/pipes/WordCountInputFormat.java	(working copy)
@@ -35,7 +35,7 @@
 private String filename;
 WordCountInputSplit() { }
 WordCountInputSplit(Path filename) {
-  this.filename = filename.toString();
+  this.filename = filename.toUri().getPath();
 }
 public void write(DataOutput out) throws IOException { 
   Text.writeString(out, filename); 
Index: src/examples/pipes/impl/wordcount-nopipe.cc
===
--- src/examples/pipes/impl/wordcount-nopipe.cc	(revision 565616)
+++ src/examples/pipes/impl/wordcount-nopipe.cc	(working copy)
@@ -87,9 +87,15 @@
 const HadoopPipes::JobConf* job = context.getJobConf();
 int part = job-getInt(mapred.task.partition);
 std::string outDir = job-get(mapred.output.dir);
+// remove the file: schema substring
+std::string::size_type posn = outDir.find(:);
+HADOOP_ASSERT(posn != std::string::npos, 
+  no schema found in output dir:  + outDir);
+outDir.erase(0, posn+1);
 mkdir(outDir.c_str(), 0777);
 std::string outFile = outDir + /part- + HadoopUtils::toString(part);
 file = fopen(outFile.c_str(), wt);
+HADOOP_ASSERT(file != NULL, can't open file for writing:  + outFile);
   }
 
   ~WordCountWriter() {

Re: Hadoop Pipes Error

2011-03-30 Thread Adarsh Sharma


Any update on the below error.

Please guide.


Thanks  best Regards,
Adarsh Sharma



Adarsh Sharma wrote:

Dear all,

Today I faced a problem while running a map-reduce job in C++. I am 
not able to understand to find the reason of the below error :



11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_0, Status : FAILED

java.io.IOException: pipe child exception
at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed 
to open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_01_0, Status : FAILED

java.io.IOException: pipe child exception
at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed 
to open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_02_0, Status : FAILED

java.io.IOException: pipe child exception
at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed 
to open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_2, Status : FAILED

java.io.IOException: pipe child exception
at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in 
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.



make  wordcount-nopipe
bin/hadoop fs -put wordcount-nopipe   bin/wordcount-nopipe
bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe
  
 or
bin/hadoop pipes -D hadoop.pipes.java.recordreader=false -D 
hadoop.pipes.java.recordwriter=false -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe


but error remains the same. I attached my Makefile also.
Please have some comments on it.

I am able to wun a simple wordcount.cpp program in Hadoop Cluster but 
don't know why this program fails in Broken Pipe error.




Thanks  best regards
Adarsh Sharma

Hadoop Pipe Error

2011-03-30 Thread Adarsh Sharma


Dear all,

Today I faced a problem while running a map-reduce job in C++. I am not 
able to understand to find the reason of the below error :



11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_00_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_01_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)


attempt_201103301130_0011_m_01_0: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:02 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_02_0, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readByte(DataInputStream.java:250)
   at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
   at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201103301130_0011_m_02_1: Hadoop Pipes Exception: failed to 
open  at wordcount-nopipe.cc:82 in 
WordCountReader::WordCountReader(HadoopPipes::MapContext)
11/03/30 12:09:15 INFO mapred.JobClient: Task Id : 
attempt_201103301130_0011_m_00_2, Status : FAILED

java.io.IOException: pipe child exception
   at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
   at 
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)

   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:35

I tried to run *wordcount-nopipe.cc* program in 
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.



make  wordcount-nopipe
bin/hadoop fs -put wordcount-nopipe   bin/wordcount-nopipe
bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe

 or
bin/hadoop pipes -D hadoop.pipes.java.recordreader=false -D 
hadoop.pipes.java.recordwriter=false -input gutenberg -output 
gutenberg-out11 -program bin/wordcount-nopipe


but error remains the same. I attached my Makefile also.
Please have some comments on it.

I am able to wun a simple wordcount.cpp program in Hadoop Cluster but 
don't know why this program fails in Broken Pipe error.




Thanks  best regards
Adarsh Sharma



CC = g++
HADOOP_INSTALL =/home/hadoop/project/hadoop-0.20.2
PLATFORM = Linux-amd64-64 
CPPFLAGS = -m64 -I/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include -I/usr/local/cuda/include


wordcount-nopipe : wordcount-nopipe.cc
$(CC) $(CPPFLAGS) $ -Wall 
-L/home/hadoop/project/hadoop-0.20.2/c++/Linux

Re: Change Map-Red Parameters

2011-03-18 Thread Adarsh Sharma


Sreekanth Ramakrishnan wrote:

Hi Adarsh,

I am still not clear about your persistant change requirement. The job 
client looks at mapred-site.xml which is present in classpath and 
then submits the job.


Now if you have a string of jobs which you have to submit:

Series-1 of job which requires configuration k1=v1
Series-2 of job which requires configuration k1=v2

You can do this in two ways, have two separate resources for the 
configuration file, or submit one wave of Series-1, modify your new 
resource externally and submit Series-2 job.
Thanks, Sir but how can we provide new resource ( new parameters ) 
externally to Hadoop and use it. So that the Hadoop Cluster after the 
changes remains in the changed state.





Or you can do the same programmatically if you have predetermined the 
v2 value,


So you can do something like
{code}
Configuration conf = new Configuration();
If(job.belongsToSeries(series-1)
   conf.set(k1,v1)
Else
 conf.set(k1,v2)
{code}


This way the code make the job uses new parameters but new jobs uses the 
environment previously set in mapred-site.xml.


Best regards,
Adarsh


The reason why I said don't modify your client side mapred-site.xml is 
because, mapred-site.xml can be used across jobs and typically would 
contain jobtracker url and other cluster wide configuration, there is 
no issues for you to put your specific configuration variable in it. 
So that will make that single submission box is only used by a single 
client.


Btw, if you are looking at spawning extra maps, you cannot control it 
mapreduce subsystem spawns it based on your file input format.


HTH



On 3/18/11 10:47 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Thanks Sreekanth, your reply correspond me to do following steps :

1. Make my own myjobs-site.xml.
2. Use Configuration class to override the properties of
mapred-site.xml.

So, it means it is impossible to make changes reflect through a
program permanently , this way you are just using another file to
have another predefined set of parameters, we can achieved it
through run-time parameters too, but this may causes less pain to
give them each time.

I am looking for an idea to make the changes permanent for e.g a
job on 100 GB requires more map tasks as compared with a 2 Gb task.
If somehow we change them dynamically, how could be make them to
reflect.


Thanks  best regards,

Adarsh Sharma


Sreekanth Ramakrishnan wrote:

Re: Change Map-Red Parameters  Hi Adarsh,
 
Configuration class supports addition of external resources

other than default Hadoop MapRed resources. You can define
your own resource file. For instance like mapred-site.xml you
can have myjobs-site.xml. You can define your job specific
parameters in the same.
 
In general mapred-site.xml contains common set of

configurations which a job client uses to submit the given
job. Changing it dynamically can potentially cause debugging
pains later. So an ideal way would be defining new
configuration file, adding the same to classpath and
programmatically setting the same as resource while creating
your job configuration. You can periodically make changes to
the newly defined resource. So if you want to override some
mapred specific parameters in the same, just see to that you
load the newly created resources after mapred resource files
are loaded.
 
HTH

Sreekanth
 
 
 
On 3/18/11 10:03 AM, Adarsh Sharma

adarsh.sha...@orkash.com wrote:
 
  


Dear all,
 
Is it possible in Hadoop Map-Reduce Framework to change

the job
execution parameters during runtime.
 
I know we can set through job.setNumTasks and many

parameters but this
remains only for that job. In Hadoop Pipes we can set it
through -D
parameter. But the problem is other.
 
What I want is to run a MAp-Reduce program that changes the

mapred-site.xml parameters and the changes got reflected
in the next jobs.
 
Please guide me if there is a way to do this.
 
 
Thanks  best Regards

Adarsh Sharma
 
 




--
Sreekanth Ramakrishnan

Read/Write xml file in Hadoop

2011-03-18 Thread Adarsh Sharma


Dear all,

I am researching on howto read/write an xml file through C++ program in 
Hadoop Pipes.


I need to achieve this as this is the requirement.

Please guide if there is a trick to do this.


Thanks  best Regards,
Adarsh

Job Configuration in Hadoop Pipes

2011-03-17 Thread Adarsh Sharma


Dear all,

Today I wrote a Java Program that specifies the NumMapTasks  
NumReduceTasks and perform a wordcount on HDFS data.


I want the same functionality through Hadoop Pipes but I am not to get 
it properly.


Is it possible in Hadoop Pipes C++ Code to the change the job 
configuration and other parameters during run-time.


Please guide how to do this.


Thanks  best regards,
Adarsh Sharma

Change Map-Red Parameters

2011-03-17 Thread Adarsh Sharma


Dear all,

Is it possible in Hadoop Map-Reduce Framework to change the job 
execution parameters during runtime.


I know we can set through job.setNumTasks and many parameters but this 
remains only for that job. In Hadoop Pipes we can set it through -D 
parameter. But the problem is other.


What I want is to run a MAp-Reduce program that changes the 
mapred-site.xml parameters and the changes got reflected in the next jobs.


Please guide me if there is a way to do this.


Thanks  best Regards
Adarsh Sharma

Re: Not able to Run C++ code in Hadoop Cluster

2011-03-15 Thread Adarsh Sharma

Is it possible to run C++/GPU Code in Map-Reduce Framework  through 
Hadoop Streaming, if there is simple example , Please let me know.



Thanks  best Regards,
Adarsh Sharma


He Chen wrote:

Agree with Keith Wiley, we use streaming also.

On Mon, Mar 14, 2011 at 11:40 AM, Keith Wiley kwi...@keithwiley.com wrote:

  

Not to speak against pipes because I don't have much experience with it,
but I eventually abandoned my pipes efforts and went with streaming.  If you
don't get pipes to work, you might take a look at streaming as an
alternative.

Cheers!



Keith Wiley kwi...@keithwiley.com keithwiley.com
music.keithwiley.com

I used to be with it, but then they changed what it was.  Now, what I'm
with
isn't it, and what's it seems weird and scary to me.
  --  Abe (Grandpa) Simpson

Not able to Run C++ code in Hadoop Cluster

2011-03-14 Thread Adarsh Sharma


Dear all,

I am puzzled around the error occured  in running a C++ program to run 
through Hadoop Pipes.
Below exception occurs while running the code. The error occurs in 
reduce phase :


[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop pipes -D 
hadoop.pipes.java.recordreader=true -D 
hadoop.pipes.java.recordwriter=true -input gutenberg -output 
gutenberg_cuda_output_final -program bin/wordcount1
11/03/14 17:27:29 WARN mapred.JobClient: No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/03/14 17:27:29 INFO mapred.FileInputFormat: Total input paths to 
process : 3

11/03/14 17:27:30 INFO mapred.JobClient: Running job: job_201103141407_0003
11/03/14 17:27:31 INFO mapred.JobClient:  map 0% reduce 0%
11/03/14 17:27:46 INFO mapred.JobClient:  map 100% reduce 0%
11/03/14 17:27:54 INFO mapred.JobClient:  map 100% reduce 33%
11/03/14 17:27:56 INFO mapred.JobClient: Task Id : 
attempt_201103141407_0003_r_00_0, Status : FAILED

java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.reduceValue(BinaryProtocol.java:302)
   at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:66)
   at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:37)
   at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)

   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

11/03/14 17:27:57 INFO mapred.JobClient:  map 100% reduce 0%
11/03/14 17:28:07 INFO mapred.JobClient:  map 100% reduce 33%
11/03/14 17:28:09 INFO mapred.JobClient: Task Id : 
attempt_201103141407_0003_r_00_1, Status : FAILED

java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
   at 
org.apache.hadoop.mapred.pipes.BinaryProtocol.reduceValue(BinaryProtocol.java:302)
   at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:66)
   at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:37)
   at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)

   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)


I attached the code. Please find the attachment.


Thanks  best Regards,

Adarsh Sharma
#include algorithm
#include limits
#include string
#include stdio.h
#includestdlib.h
#include cuda.h
 
#include  stdint.h  // --- to prevent uint64_t errors! 
 
#include hadoop/Pipes.hh
#include hadoop/TemplateFactory.hh
#include hadoop/StringUtils.hh
//#include /usr/local/cuda/include/shrUtils.h
#include /usr/local/cuda/include/cuda_runtime_api.h


 
using namespace std;
 
class WordCountMapper : public HadoopPipes::Mapper {
public:
  // constructor: does nothing
  WordCountMapper( HadoopPipes::TaskContext context ) {
  }


void map( HadoopPipes::MapContext context )
 {
		 cudaDeviceProp prop;
 std::string sProfileString = prop.name;
 std::string s2 = abc;
 context.emit(sProfileString,s2);
}
};
 
class WordCountReducer : public HadoopPipes::Reducer {
public:
  // constructor: does nothing
  WordCountReducer(HadoopPipes::TaskContext context) {
  }
 
  // reduce function
  void reduce( HadoopPipes::ReduceContext context ) {

context.emit(context.getInputKey(), context.getInputValue());
}

};
 
int main(int argc, char *argv[]) {
  return HadoopPipes::runTask(HadoopPipes::TemplateFactoryWordCountMapper, WordCountReducer () );
}

Running Cuda Program through Hadoop Pipes

2011-03-11 Thread Adarsh Sharma


Dear all,

Today I am trying a Cuda Program to run on a Hadoop Cluster of 2 nodes 
with GPU enabled.
But I am facing  some issues  while running  the cuda  code  through  
Hadoop  Pipes.


I read the whole wiki Hadoop page but it doesn't provide the information 
on how to run third-party programs in it as Jcuda provides a way to 
include all cuda jars  binaries to run on Hadoop Cluster.


I attached the code of the program . Please find the attachment.

I am able to run it through Java JNI but I want to achieve through 
Hadoop Pipes.

Is it possible or some parameters to achieve this would definitely help me.


Thanks  best Regards,

Adarsh Sharma
#include algorithm
#include limits
#include string

#include cuda.h
 
#include  stdint.h  // --- to prevent uint64_t errors! 
 
#include hadoop/Pipes.hh
#include hadoop/TemplateFactory.hh
#include hadoop/StringUtils.hh

#include 5.cu
 
using namespace std;
 
class WordCountMapper : public HadoopPipes::Mapper 
{
	private:
		float *a_h, *a_d;
		const int N = 10;

		int block_size;
		int n_blocks;
		size_t size;


	public:
  		// constructor: does nothing
  		WordCountMapper( HadoopPipes::TaskContext context ) 
		{
	  		N = 10;  
	  		size = N * sizeof(float);
	  		a_h = (float *)malloc(size);
	  		cudaMalloc((void **) a_d, size);   
	  	  		for (i=0; iN; i++) a_h[i] = (float)i;
	  		cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
	  		block_size = 4;
	  		n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
  		
	 
 			
	  
	  		free(a_h); cudaFree(a_d);
  		}
 
  // map function: receives a line, outputs (word,1)
  // to reducer.
  	void map( HadoopPipes::MapContext context ) 
	{
//--- get line of text ---
		string line = context.getInputValue();


		/* Adarsh's cuda specific code */
	
		square_array  n_blocks, block_size  (a_d, N);

		cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

	
 
   /* //--- split it into words ---
vector string  words =
  HadoopUtils::splitString( line,   );
	*/
 
//--- emit each word tuple (word, 1 ) ---
		for ( unsigned int i=0; i  size; i++ ) 
		{
  			context.emit( HadoopUtils::toString(a_h[i]), HadoopUtils::toString( 1 ) );
		
		}
  	}
};
 
class WordCountReducer : public HadoopPipes::Reducer {
public:
  // constructor: does nothing
  WordCountReducer(HadoopPipes::TaskContext context) {
  }
 
  // reduce function
  void reduce( HadoopPipes::ReduceContext context ) {
int count = 0;
 
//--- get all tuples with the same key, and count their numbers ---
while ( context.nextValue() ) {
  count += HadoopUtils::toInt( context.getInputValue() );
}
 
//--- emit (word, count) ---
context.emit(context.getInputKey(), HadoopUtils::toString( count ));
  }
};
 
int main(int argc, char *argv[]) {
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory 
			  WordCountMapper, 
  WordCountReducer () );
}

Re: Cuda Program in Hadoop Cluster

2011-03-11 Thread Adarsh Sharma

So, it means it is impossible to run GPU code ( myfile.cu ) through
Hadoop Pipes .

It's the requirement to run a C++ code that includes some Cuda code (
Cuda libraries _global_ function ) in a Hadoop Cluster.

Thanks best Regards,

Adarsh Sharma

Lance Norskog wrote:

One of the Python CUDA packages translates Python byte-code into
CUDA-style C and runs the toolchain. If this actually works, you can
just do Python apps under Hadoop Streaming.

On Wed, Mar 9, 2011 at 8:39 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:

Thanks Chen for your guidance,

To translate CUDA in Hadoop environment, following are the options :

-- Hadoop Streaming: Standard I/O ( Recommended by you )
-- Hadoop Pipes: C++ library, Socket connection, ( I'm looking for )
-- JNI, JNI-based CUDA wrapper (JCUDA) ( Done and able to run a Jcuda
Program in Hadoop Cluster )

We use Hadoop Pipes for our proposed technique
. MapReduce applications/CUDA kernel ? written in C++

So i.e why I am trying a C Cuda code to run through Hadoop Pipes, is it
difficult or not possible.
I am looking for a demo Cuda program that is up and running in Hadoop
Cluster that clarifies my basic concepts so that I program accordingly in
future.

Looking forward for some more guidance.

Thanks once again for your wishes .

With best Regards,

Adarsh Sharma

He Chen wrote:

Hi, Adarsh Sharma

For C code

My friend employ hadoop streaming to run CUDA C code. You can send email
to
him. p...@cse.unl.edu.

Best wishes!

Chen

On Thu, Mar 3, 2011 at 11:18 PM, Adarsh Sharma
adarsh.sha...@orkash.comwrote:

Dear all,

I followed a fantastic tutorial and able to run the Wordcont C++ Program
in
Hadoop Cluster.

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

But know I want to run a Cuda Program in the Hadoop Cluster but results
in
errors.
Is anyone has done it before and guide me how to do this.

I attached the both files. Please find the attachment.

Thanks best Regards,

Adarsh Sharma

Reason of Formatting Namenode

2011-03-09 Thread Adarsh Sharma


Dear all,

I have configured several times a Hadoop Cluster of 2,3,5,8 nodes but 
one doubt in my mind always occur.
Why it is necessary to format Hadoop Namenode by *bin/hadoop -namenode 
format *command.

What is the reason and logic behind this.

Please justify if someone knows.


Thanks  best Regards,

Adarsh Sharma

Re: Cuda Program in Hadoop Cluster

2011-03-09 Thread Adarsh Sharma

Thanks Chen for your guidance,

To translate CUDA in Hadoop environment, following are the options :

We use Hadoop Pipes for our proposed technique
. MapReduce applications/CUDA kernel ? written in C++

Looking forward for some more guidance.

Thanks once again for your wishes .

With best Regards,

Adarsh Sharma

He Chen wrote:

Hi, Adarsh Sharma

For C code

My friend employ hadoop streaming to run CUDA C code. You can send email to
him. p...@cse.unl.edu.

Best wishes!

Chen

On Thu, Mar 3, 2011 at 11:18 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

Dear all,

I followed a fantastic tutorial and able to run the Wordcont C++ Program in
Hadoop Cluster.

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

But know I want to run a Cuda Program in the Hadoop Cluster but results in
errors.
Is anyone has done it before and guide me how to do this.

I attached the both files. Please find the attachment.

Thanks best Regards,

Adarsh Sharma

Re: Reason of Formatting Namenode

2011-03-09 Thread Adarsh Sharma

Thanks Harsh, i.e why if we again format namenode after loading some 
data INCOMATIBLE NAMESPACE ID's error occurs.



Best Regards,

Adarsh Sharma




Harsh J wrote:

Formatting the NameNode initializes the FSNameSystem in the
dfs.name.dir directories, to prepare for use.

The format command typically writes a VERSION file that specifies what
the NamespaceID for this FS instance is, what was its ctime, and what
is the version (of the file's layout) in use.

This is helpful in making every NameNode instance unique, among other
things. DataNode blocks carry the namespace-id information that lets
them relate blocks to a NameNode (and thereby validate, etc.).

Re: Unable to use hadoop cluster on the cloud

2011-03-06 Thread Adarsh Sharma


praveen.pe...@nokia.com wrote:

Thanks Adarsh for the reply.

Just to clarify the issue a bit, I am able to do all operations 
(-copyFromLocal, -get -rmr etc) from the master node. So I am confident that 
the communication between all hadoop machines is fine. But when I do the same 
operation from another machine that also has same hadoop config, I get below 
errors. However I can do -lsr and it lists the files correctly.
  


Praveen, Your error is due to communication problem between your 
datanodes i.e Datanode1 couldn't able to place the replica of a block 
into coresponding datanode2.

U mention the HDFS commands.

Simply check from datanode1 as

ssh datanode2_ip or
ping datanode2_ip


Best Rgds, Adarsh



Praveen

-Original Message-
From: ext Adarsh Sharma [mailto:adarsh.sha...@orkash.com] 
Sent: Friday, March 04, 2011 12:12 AM

To: common-user@hadoop.apache.org
Subject: Re: Unable to use hadoop cluster on the cloud

Hi Praveen, Check through ssh  ping that your datanodes are communicating with 
each other or not.

Cheers, Adarsh
praveen.pe...@nokia.com wrote:
  

Hello all,
I installed hadoop0.20.2 on physical machines and everything works like a charm. Now I 
installed hadoop using the same hadoop-install gz file on the cloud. Installation seems 
fine. I can even copy files to hdfs from master machine. But when I try to do it from 
another non hadoop machine, I get following error. I did googling and lot of 
people got this error but could not find any solution.

Also I didn't see any exceptions in the hadoop logs.

Any thoughts?

$ /usr/local/hadoop-0.20.2/bin/hadoop fs -copyFromLocal 
Merchandising-ear.tar.gz /tmp/hadoop-test/Merchandising-ear.tar.gz
11/03/03 21:58:50 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.ConnectException: Connection timed 
out
11/03/03 21:58:50 INFO hdfs.DFSClient: Abandoning block 
blk_-8243207628973732008_1005
11/03/03 21:58:50 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.12:50010
11/03/03 21:59:17 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.ConnectException: Connection timed 
out
11/03/03 21:59:17 INFO hdfs.DFSClient: Abandoning block 
blk_2852127666568026830_1005
11/03/03 21:59:17 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.12:50010
11/03/03 21:59:44 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.ConnectException: Connection timed 
out
11/03/03 21:59:44 INFO hdfs.DFSClient: Abandoning block 
blk_2284836193463265901_1005
11/03/03 21:59:44 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.12:50010
11/03/03 22:00:11 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.ConnectException: Connection timed 
out
11/03/03 22:00:11 INFO hdfs.DFSClient: Abandoning block 
blk_-5600915414055250488_1005
11/03/03 22:00:11 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.11:50010

11/03/03 22:00:17 WARN hdfs.DFSClient: DataStreamer Exception: 
java.io.IOException: Unable to create new block.
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSC

lient.java:2288)

11/03/03 22:00:17 WARN hdfs.DFSClient: Error Recovery for block 
blk_-5600915414055250488_1005 bad datanode[0] nodes == null

11/03/03 22:00:17 WARN hdfs.DFSClient: Could not get block locations. Source file 
/tmp/hadoop-test/Merchandising-ear.tar.gz - Aborting...
copyFromLocal: Connection timed out
11/03/03 22:00:17 ERROR hdfs.DFSClient: Exception closing file 
/tmp/hadoop-test/Merchandising-ear.tar.gz : java.net.ConnectException: 
Connection timed out

java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSC

lient.java:2288)
[C4554954_admin@c4554954vl03 relevancy]$

Re: Unable to use hadoop cluster on the cloud

2011-03-03 Thread Adarsh Sharma

Hi Praveen, Check through ssh  ping that your datanodes are 
communicating with each other or not.


Cheers, Adarsh
praveen.pe...@nokia.com wrote:

Hello all,
I installed hadoop0.20.2 on physical machines and everything works like a charm. Now I 
installed hadoop using the same hadoop-install gz file on the cloud. Installation seems 
fine. I can even copy files to hdfs from master machine. But when I try to do it from 
another non hadoop machine, I get following error. I did googling and lot of 
people got this error but could not find any solution.

Also I didn't see any exceptions in the hadoop logs.

Any thoughts?

$ /usr/local/hadoop-0.20.2/bin/hadoop fs -copyFromLocal 
Merchandising-ear.tar.gz /tmp/hadoop-test/Merchandising-ear.tar.gz
11/03/03 21:58:50 INFO hdfs.DFSClient: Exception in createBlockOutputStream 
java.net.ConnectException: Connection timed out
11/03/03 21:58:50 INFO hdfs.DFSClient: Abandoning block 
blk_-8243207628973732008_1005
11/03/03 21:58:50 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.12:50010
11/03/03 21:59:17 INFO hdfs.DFSClient: Exception in createBlockOutputStream 
java.net.ConnectException: Connection timed out
11/03/03 21:59:17 INFO hdfs.DFSClient: Abandoning block 
blk_2852127666568026830_1005
11/03/03 21:59:17 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.12:50010
11/03/03 21:59:44 INFO hdfs.DFSClient: Exception in createBlockOutputStream 
java.net.ConnectException: Connection timed out
11/03/03 21:59:44 INFO hdfs.DFSClient: Abandoning block 
blk_2284836193463265901_1005
11/03/03 21:59:44 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.12:50010
11/03/03 22:00:11 INFO hdfs.DFSClient: Exception in createBlockOutputStream 
java.net.ConnectException: Connection timed out
11/03/03 22:00:11 INFO hdfs.DFSClient: Abandoning block 
blk_-5600915414055250488_1005
11/03/03 22:00:11 INFO hdfs.DFSClient: Waiting to find target node: 
xx.xx.16.11:50010
11/03/03 22:00:17 WARN hdfs.DFSClient: DataStreamer Exception: 
java.io.IOException: Unable to create new block.
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

11/03/03 22:00:17 WARN hdfs.DFSClient: Error Recovery for block 
blk_-5600915414055250488_1005 bad datanode[0] nodes == null
11/03/03 22:00:17 WARN hdfs.DFSClient: Could not get block locations. Source file 
/tmp/hadoop-test/Merchandising-ear.tar.gz - Aborting...
copyFromLocal: Connection timed out
11/03/03 22:00:17 ERROR hdfs.DFSClient: Exception closing file 
/tmp/hadoop-test/Merchandising-ear.tar.gz : java.net.ConnectException: 
Connection timed out
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
[C4554954_admin@c4554954vl03 relevancy]$

Re: Library Issue

2011-02-28 Thread Adarsh Sharma


Harsh J wrote:

You're facing a permissions issue with a device, not a Hadoop-related
issue. Find a way to let users access the required devices
(/dev/nvidiactl is what's reported in your ST, for starters).

On Mon, Feb 28, 2011 at 12:05 PM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
 

Greetings to all,

Today i came across a strange problem about non-root users in Linux 
( CentOS

).

I am able to compile  run a Java Program through below commands 
properly :


[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But I need to run it through other user [B]hadoop[/B]  in CentOS

[hadoop@ws37-mah-lin hadoop-0.20.2]$ javac EnumDevices.java
[hadoop@ws37-mah-lin hadoop-0.20.2]$ java EnumDevices
NVIDIA: could not open the device file /dev/nvidiactl (Permission 
denied).

Exception in thread main CUDA Driver error: 100
  at jcuda.CUDA.setError(CUDA.java:1874)
  at jcuda.CUDA.init(CUDA.java:62)
  at jcuda.CUDA.init(CUDA.java:42)
  at EnumDevices.main(EnumDevices.java:20)


*I settled the above issue by setting permission to /dev/nvidia files to 
Hadoop user  group.

Now, I am able to compile programs from command

[hadoop@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[hadoop@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But Still don't know why it fails in Map-reduce job.

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar 
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
11/02/28 15:01:45 INFO input.FileInputFormat: Total input paths to 
process : 3

11/02/28 15:01:45 INFO mapred.JobClient: Running job: job_201102281104_0006
11/02/28 15:01:46 INFO mapred.JobClient:  map 0% reduce 0%
11/02/28 15:01:56 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_00_0, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/28 15:01:56 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_01_0, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/28 15:02:05 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_02_1, Status : FAILED

Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma


Dear all,

I want to set some extra jars in java.library.path , used while running 
map-reduce program in Hadoop Cluster.


I got a exception entitled no jcuda in java.library.path in each map task.

I run my map-reduce code by below commands :

javac -classpath 
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/hadoop-0.20.2/lib/commons-cli-1.2.jar 
-d wordcount_classes1/ WordCount.java


jar -cvf wordcount1.jar -C wordcount_classes1/ .

bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg 
/user/hadoop/output1



Please guide how to achieve this.



Thanks  best Regards,

Adarsh Sharma

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma


Thanks Sanjay, it seems i found the root cause.

But I result in following error:

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar 
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
Exception in specified URI's java.net.URISyntaxException: Illegal 
character in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar

   at java.net.URI$Parser.fail(URI.java:2809)
   at java.net.URI$Parser.checkChars(URI.java:2982)
   at java.net.URI$Parser.parseHierarchical(URI.java:3066)
   at java.net.URI$Parser.parse(URI.java:3014)
   at java.net.URI.init(URI.java:578)
   at 
org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
   at 
org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
   at 
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
   at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)

   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Exception in thread main java.lang.NullPointerException
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
   at 
org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
   at 
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
   at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)

   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Please check my attached mapred-site.xml


Thanks  best regards,

Adarsh Sharma


Kaluskar, Sanjay wrote:

You will probably have to use distcache to distribute your jar to all
the nodes too. Read the distcache documentation; Then on each node you
can add the new jar to the java.library.path through
mapred.child.java.opts.

You need to do something like the following in mapred-site.xml, where
fs-uri is the URI of the file system (something like
host.mycompany.com:54310).

property
  namemapred.cache.files/name
  valuehdfs://fs-uri/jcuda/jcuda.jar#jcuda.jar /value
/property
property
  namemapred.create.symlink/name
  valueyes/value
/property
property
  namemapred.child.java.opts/name
  value-Djava.library.path=jcuda.jar/value
/property


-Original Message-
From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] 
Sent: 28 February 2011 16:03

To: common-user@hadoop.apache.org
Subject: Setting java.library.path for map-reduce job

Dear all,

I want to set some extra jars in java.library.path , used while running
map-reduce program in Hadoop Cluster.

I got a exception entitled no jcuda in java.library.path in each map
task.

I run my map-reduce code by below commands :

javac -classpath
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/
project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/h
adoop-0.20.2/lib/commons-cli-1.2.jar
-d wordcount_classes1/ WordCount.java

jar -cvf wordcount1.jar -C wordcount_classes1/ .

bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg
/user/hadoop/output1


Please guide how to achieve this.



Thanks  best Regards,

Adarsh Sharma
  


?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
  namemapred.job.tracker/name
  value192.168.0.131:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property

property
  namemapred.local.dir/name
  value/hdd-1/mapred/local/value
  descriptionThe local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  /description
/property


property
  namemapred.system.dir/name
  value/home/mapred/system/value
  descriptionThe shared directory where MapReduce

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma


Sonal Goyal wrote:

Hi Adarsh,

I think your mapred.cache.files property has an extra space at the end. Try
removing that and let us know how it goes.
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal


  


Thanks a Lot Sonal but it doesn't succeed.
Please if possible tell me the proper steps that are need to be followed 
after Configuring Hadoop Cluster.


I don't believe that a simple commands succeeded as

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512


but in Map-reduce job it fails :

11/02/28 18:42:47 INFO mapred.JobClient: Task Id : 
attempt_201102281834_0001_m_01_2, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)



Thanks  best Regards,

Adarsh Sharma



On Mon, Feb 28, 2011 at 5:06 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Thanks Sanjay, it seems i found the root cause.

But I result in following error:

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
Exception in specified URI's java.net.URISyntaxException: Illegal character
in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parseHierarchical(URI.java:3066)
  at java.net.URI$Parser.parse(URI.java:3014)
  at java.net.URI.init(URI.java:578)
  at
org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
  at
org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
  at
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
  at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Exception in thread main java.lang.NullPointerException
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
  at
org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
  at
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
  at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Please check my attached mapred-site.xml


Thanks  best regards,

Adarsh Sharma



Kaluskar, Sanjay wrote:



You will probably have to use distcache to distribute your jar to all
the nodes too. Read

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma


Sonal Goyal wrote:

Adarsh,

Are you trying to distribute both the native library and the jcuda.jar?
Could you please explain your job's dependencies?
  


Yes Of course , I am trying to run a Juda program in Hadoop Cluster as I 
am able to run it simple through simple javac  java commands at 
standalone machine by setting PATH  LD_LIBRARY_PATH varibale to 
*/usr/local/cuda/lib*  */home/hadoop/project/jcuda_1.1_linux *folder.


I listed the contents  jars in these directories :

[hadoop@cuda1 lib]$ pwd
/usr/local/cuda/lib
[hadoop@cuda1 lib]$ ls -ls
total 158036
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcublas.so - 
libcublas.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcublas.so.3 - 
libcublas.so.3.2.16

81848 -rwxrwxrwx 1 root root 83720712 Feb 23 19:37 libcublas.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcudart.so - 
libcudart.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcudart.so.3 - 
libcudart.so.3.2.16

 424 -rwxrwxrwx 1 root root   423660 Feb 23 19:37 libcudart.so.3.2.16
   4 lrwxrwxrwx 1 root root   13 Feb 23 19:37 libcufft.so - 
libcufft.so.3
   4 lrwxrwxrwx 1 root root   18 Feb 23 19:37 libcufft.so.3 - 
libcufft.so.3.2.16

27724 -rwxrwxrwx 1 root root 28351780 Feb 23 19:37 libcufft.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcurand.so - 
libcurand.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcurand.so.3 - 
libcurand.so.3.2.16

4120 -rwxrwxrwx 1 root root  4209384 Feb 23 19:37 libcurand.so.3.2.16
   4 lrwxrwxrwx 1 root root   16 Feb 23 19:37 libcusparse.so - 
libcusparse.so.3
   4 lrwxrwxrwx 1 root root   21 Feb 23 19:37 libcusparse.so.3 - 
libcusparse.so.3.2.16

43048 -rwxrwxrwx 1 root root 44024836 Feb 23 19:37 libcusparse.so.3.2.16
 172 -rwxrwxrwx 1 root root   166379 Nov 25 11:29 
libJCublas-linux-x86_64.so
 152 -rwxrwxrwx 1 root root   144179 Nov 25 11:29 
libJCudaDriver-linux-x86_64.so

  16 -rwxrwxrwx 1 root root 8474 Mar 31  2009 libjcudafft.so
 136 -rwxrwxrwx 1 root root   128672 Nov 25 11:29 
libJCudaRuntime-linux-x86_64.so

  80 -rwxrwxrwx 1 root root70381 Mar 31  2009 libjcuda.so
  44 -rwxrwxrwx 1 root root38039 Nov 25 11:29 libJCudpp-linux-x86_64.so
  44 -rwxrwxrwx 1 root root38383 Nov 25 11:29 libJCufft-linux-x86_64.so
  48 -rwxrwxrwx 1 root root43706 Nov 25 11:29 
libJCurand-linux-x86_64.so
 140 -rwxrwxrwx 1 root root   133280 Nov 25 11:29 
libJCusparse-linux-x86_64.so


And the second folder as :

[hadoop@cuda1 jcuda_1.1_linux64]$ pwd
/home/hadoop/project/hadoop-0.20.2/jcuda_1.1_linux64
[hadoop@cuda1 jcuda_1.1_linux64]$ ls -ls
total 200
8 drwxrwxrwx 6 hadoop hadoop  4096 Feb 24 01:44 doc
8 drwxrwxrwx 3 hadoop hadoop  4096 Feb 24 01:43 examples
32 -rwxrwxr-x 1 hadoop hadoop 28484 Feb 24 01:43 jcuda.jar
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3.2.16
16 -rwxr-xr-x 1 hadoop hadoop  8474 Mar  1 04:12 libjcudafft.so
80 -rwxr-xr-x 1 hadoop hadoop 70381 Mar  1 04:11 libjcuda.so
8 -rwxrwxr-x 1 hadoop hadoop   811 Feb 24 01:43 README.txt
8 drwxrwxrwx 2 hadoop hadoop  4096 Feb 24 01:43 resources
[hadoop@cuda1 jcuda_1.1_linux64]$

I think Hadoop would not able to recognize *jcuda.jar* in Tasktracker 
process. Please guide me how to make it available in it.



Thanks  best Regards,
Adrash Sharma


Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Mon, Feb 28, 2011 at 6:54 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Sonal Goyal wrote:



Hi Adarsh,

I think your mapred.cache.files property has an extra space at the end.
Try
removing that and let us know how it goes.
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal




  

Thanks a Lot Sonal but it doesn't succeed.
Please if possible tell me the proper steps that are need to be followed
after Configuring Hadoop Cluster.

I don't believe that a simple commands succeeded as

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512


but in Map-reduce job

Library Issue

2011-02-27 Thread Adarsh Sharma


Greetings to all,

Today i came across a strange problem about non-root users in Linux ( 
CentOS ).


I am able to compile  run a Java Program through below commands properly :

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But I need to run it through other user [B]hadoop[/B]  in CentOS

[hadoop@ws37-mah-lin hadoop-0.20.2]$ javac EnumDevices.java
[hadoop@ws37-mah-lin hadoop-0.20.2]$ java EnumDevices
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Exception in thread main CUDA Driver error: 100
   at jcuda.CUDA.setError(CUDA.java:1874)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at EnumDevices.main(EnumDevices.java:20)
[hadoop@ws37-mah-lin hadoop-0.20.2]$

Actually I need to run a map-reduce code but first if it runs through 
simple then I will go for it.


Please guide me how to solve this issue as CLASSPATH is same through all 
users.




Thanks  best Regards,
Adarsh Sharma

Re: Library Issue

2011-02-27 Thread Adarsh Sharma


Thanx Harsh, I am not so much expert in Linux but knows little bit.
Do I have to research on my Application *nvidiactl* level or there  or 
just simple commands to make Hadoop uses /dev/nvidiaactl ( driver file 
libcuddpp.so )



Best Regards, Adarsh



Harsh J wrote:

You're facing a permissions issue with a device, not a Hadoop-related
issue. Find a way to let users access the required devices
(/dev/nvidiactl is what's reported in your ST, for starters).

On Mon, Feb 28, 2011 at 12:05 PM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
  

Greetings to all,

Today i came across a strange problem about non-root users in Linux ( CentOS
).

I am able to compile  run a Java Program through below commands properly :

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But I need to run it through other user [B]hadoop[/B]  in CentOS

[hadoop@ws37-mah-lin hadoop-0.20.2]$ javac EnumDevices.java
[hadoop@ws37-mah-lin hadoop-0.20.2]$ java EnumDevices
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Exception in thread main CUDA Driver error: 100
  at jcuda.CUDA.setError(CUDA.java:1874)
  at jcuda.CUDA.init(CUDA.java:62)
  at jcuda.CUDA.init(CUDA.java:42)
  at EnumDevices.main(EnumDevices.java:20)
[hadoop@ws37-mah-lin hadoop-0.20.2]$

Actually I need to run a map-reduce code but first if it runs through simple
then I will go for it.

Please guide me how to solve this issue as CLASSPATH is same through all
users.



Thanks  best Regards,
Adarsh Sharma

Library Issues

2011-02-23 Thread Adarsh Sharma

)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
*Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path*
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1709)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/22 09:59:42 INFO mapred.JobClient: Task Id : 
attempt_201102220937_0001_m_02_1, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/22 09:59:57 INFO mapred.JobClient: Job complete: job_201102220937_0001
11/02/22 09:59:57 INFO mapred.JobClient: Counters: 3
11/02/22 09:59:57 INFO mapred.JobClient:   Job Counters
11/02/22 09:59:57 INFO mapred.JobClient: Launched map tasks=12
11/02/22 09:59:57 INFO mapred.JobClient: Data-local map tasks=12
11/02/22 09:59:57 INFO mapred.JobClient: Failed map tasks=1
[hadoop@ws37-mah-lin hadoop-0.20.2]$


My PATH Variable shows that it includes all libraries as

[hadoop@cuda1 ~]$ echo $PATH
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hadoop/project/hadoop-0.20.2/jcuda.jar:/usr/local/cuda/lib:/home/hadoop/bin
[hadoop@cuda1 ~]$


I don't how to resolve this error. Please help



Thanks  best Regards,

Adarsh Sharma

Re: Hadoop in Real time applications

2011-02-17 Thread Adarsh Sharma


I think Facebook, Uses Hadoop, Casandra for their Analytics Purposes.


Thanks, Adarsh



Michael Segel wrote:
Uhm... 


'Realtime' is relative.

Facebook uses HBase for e-mail, right? Now isn't that a 'realtime' application?
;-)

If you're talking about realtime as in like a controller? Or a systems of 
record for a stock exchange? That wouldn't be a good fit.


  

Date: Thu, 17 Feb 2011 17:26:04 +0530
Subject: Re: Hadoop in Real time applications
From: karthik84ku...@gmail.com
To: common-user@hadoop.apache.org

Hi,

Thanks for the clarification.

On Thu, Feb 17, 2011 at 2:09 PM, Niels Basjes ni...@basjes.nl wrote:



2011/2/17 Karthik Kumar karthik84ku...@gmail.com:
  

Can Hadoop be used for Real time Applications such as banking


solutions...

Hadoop consists of several components.
Components like HDFS and HBase are quite suitable for interactive
solutions (as in: I usually get an answer within 0.x seconds).
If you really need realtime (as in: I want a guarantee that I have
an answer within 0.x seconds) the answer is: No, HDFS/HBase cannot
guarantee that.
Other components like MapReduce (and Hive which run on top of
MapReduce) are purely batch oriented.

--
Met vriendelijke groeten,

Niels Basjes

  


--
With Regards,
Karthik

Re: CUDA on Hadoop

2011-02-10 Thread Adarsh Sharma


Steve Loughran wrote:

On 09/02/11 17:31, He Chen wrote:

Hi sharma

I shared our slides about CUDA performance on Hadoop clusters. Feel 
free to

modified it, please mention the copyright!


This is nice. If you stick it up online you should link to it from the 
Hadoop wiki pages -maybe start a hadoop+cuda page and refer to it


Yes,  This will be very helpful for others too. But This much 
information is not sufficient , need more.




Best Regards

Adarsh Sharma

CUDA on Hadoop

2011-02-09 Thread Adarsh Sharma


Dear all,

I am going to work on a Project that includes  Working on CUDA in 
Hadoop Environment .


I work on Hadoop Platforms ( Hadoop, Hive, Hbase, Map-Reduce ) from the 
past 8 months.


If anyone has some working experience or some pointers to basic steps 
includes Basic Introduction, Configuring  Running CUDA programs in 
Hadoop Cluster , any White Papers or any sort of helpful information, 
Please let me know through links or materials.


I shall be grateful for any kindness.



Thanks  Best Regards

Adarsh Sharma

Re: CUDA on Hadoop

2011-02-09 Thread Adarsh Sharma


Thanx Harsh, I find the below link to start with some practical knowledge.

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

But Is HAMA Project has some usefulness for making a sort of Analysis 
Engine that analysis TB's data in Hadoop HDFS.




Best Regards

Adarsh Sharma


Harsh J wrote:

You can check-out this project which did some work for Hama+CUDA:
http://code.google.com/p/mrcl/

On Wed, Feb 9, 2011 at 6:38 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Dear all,

I am going to work on a Project that includes  Working on CUDA in Hadoop
Environment .

I work on Hadoop Platforms ( Hadoop, Hive, Hbase, Map-Reduce ) from the past
8 months.

If anyone has some working experience or some pointers to basic steps
includes Basic Introduction, Configuring  Running CUDA programs in Hadoop
Cluster , any White Papers or any sort of helpful information, Please let me
know through links or materials.

I shall be grateful for any kindness.



Thanks  Best Regards

Adarsh Sharma

Re: CUDA on Hadoop

2011-02-09 Thread Adarsh Sharma


He Chen wrote:

Hi sharma

I shared our slides about CUDA performance on Hadoop clusters. Feel 
free to modified it, please mention the copyright!


Chen

On Wed, Feb 9, 2011 at 11:13 AM, He Chen airb...@gmail.com 
mailto:airb...@gmail.com wrote:


Hi  Sharma

I have some experiences on working Hybrid Hadoop with GPU. Our
group has tested CUDA performance on Hadoop clusters. We obtain 20
times speedup and save up to 95% power consumption in some
computation-intensive test case. 


You can parallel your Java code by using JCUDA which is a kind of
API to help you call CUDA in your Java code.

Chen 



On Wed, Feb 9, 2011 at 8:45 AM, Steve Loughran ste...@apache.org
mailto:ste...@apache.org wrote:

On 09/02/11 13:58, Harsh J wrote:

You can check-out this project which did some work for
Hama+CUDA:
http://code.google.com/p/mrcl/


Amazon let you bring up a Hadoop cluster on machines with GPUs
you can code against, but I haven't heard of anyone using it.
The big issue is bandwidth; it just doesn't make sense for a
classic scan through the logs kind of problem as the
disk:GPU bandwidth ratio is even worse than disk:CPU.

That said, if you were doing something that involved a lot of
compute on a block of data (e.g. rendering tiles in a map),
this could work.



Thanks Chen , I am looking for some White-Papers on the mentioned topic 
or concerning.

I think no one has write any white paper on this topic Or I'm wrong.

However U'r Ppt is very nice.
Thanx Once again .

Adarsh

Re: When applying a patch, which attachment should I use?

2011-01-20 Thread Adarsh Sharma


Thanx Edward, Today I look upon your considerations and start working :

edward choi wrote:

Dear Adarsh,

I have a single machine running Namenode/JobTracker/Hbase Master.
There are 17 machines running Datanode/TaskTracker
Among those 17 machines, 14 are running Hbase Regionservers.
The other 3 machines are running Zookeeper.
  


I have 10 servers and a  single machine running 
Namenode/JobTracker/Hbase Master.

There are 9 machines running Datanode/TaskTracker
Among those 9 machines, 6 are running Hbase Regionservers.
The other 3 machines are running Zookeeper.
I'm using hadoop-0.20.2, hbase-0.20.3

And about the Zookeeper,
Hbase comes with its own Zookeeper so you don't need to install a new
Zookeeper. (except for the special occasion, which I'll explain later)
I assigned 14 machines as regionservers using
$HBASE_HOME/conf/regionservers.
I assigned 3 machines as Zookeeperss using hbase.zookeeper.quorum property
in $HBASE_HOME/conf/hbase-site.xml.
Don't forget to set export HBASE_MANAGES_ZK=true
  


I think bydefault it takes true anyways I set export 
HBASE_MANAGES_ZK=true in hbase-env.sh



in $HBASE_HOME/conf/hbase-env.sh. (This is where you announce that you
will be using Zookeeper that comes with HBase)
This way, when you execute $HBASE_HOME/bin/start-hbase.sh, HBase will
automatically start Zookeeper first, then start HBase daemons.
  
But perhaps I found my Hbase Master is running  through Web UI. But 
there are exceptions in my Zookeeper Logs. I am also able to create 
table in hbase and view it.


The onle thing I don't do is apply the *hdfs-630-0.20-append.patch* to 
each hadoop package in each node. As I don't know how to apply it.


If this is the problem Please guide me the steps to apply it.

I also attached my Zookeeper Logs of my Zookeeper Servers.
Please find the attachment.


Also, you can install your own Zookeeper and tell HBase to use it instead of
its own.
I read it on the internet that Zookeeper that comes with HBase does not work
properly on Windows 7 64bit. (
http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/)
So in that case you need to install your own Zookeeper, set it up properly,
and tell HBase to use it instead of its own.
All you need to do is configure zoo.cfg and add it to the HBase CLASSPATH.
And don't forget to set export HBASE_MANAGES_ZK=false
in $HBASE_HOME/conf/hbase-env.sh.
This way, HBase will not start Zookeeper automatically.

About the separation of Zookeepers from regionservers,
Yes, it is recommended to separate Zookeepers from regionservers.
But that won't be necessary unless your clusters are very heavily loaded.
They also suggest that you give Zookeeper its own hard disk. But I haven't
done that myself yet. (Hard disks cost money you know)
So I'd say your cluster seems fine.
But when you want to expand your cluster, you'd need some changes. I suggest
you take a look at Hadoop: The Definitive Guide.

  


Thanks  Best Regards

Adarsh Sharma


Regards,
Edward

  




2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com

  

Thanks Edward,

Can you describe me the architecture used in your configuration.

Fore.g I have a cluster of 10 servers and

1 node act as ( Namenode, Jobtracker, Hmaster ).
Remainning 9 nodes act as ( Slaves, datanodes, Tasktracker, Hregionservers
).
Among these 9 nodes I also set 3 nodes in zookeeper.quorum.property.

I want to know that is it necessary to configure zookeeper separately with
the zookeeper-3.2.2 package or just have some IP's listed in

zookeeper.quorum.property and Hbase take care of it.

Can we specify IP's of Hregionservers used before as zookeeper servers (
HQuorumPeer ) or we must need separate servers for it.

My problem arises in running zookeeper. My Hbase is up and running  in
fully distributed mode too.




With Best Regards

Adarsh Sharma








edward choi wrote:



Dear Adarsh,

My situation is somewhat different from yours as I am only running Hadoop
and Hbase (as opposed to Hadoop/Hive/Hbase).

But I hope my experience could be of help to you somehow.

I applied the hdfs-630-0.20-append.patch to every single Hadoop node.
(including master and slaves)
Then I followed exactly what they told me to do on

http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description
.

I didn't get a single error message and successfully started HBase in a
fully distributed mode.

I am not using Hive so I can't tell what caused the
MasterNotRunningException, but the patch above is meant to  allow
DFSClients
pass NameNode lists of known dead Datanodes.
I doubt that the patch has anything to do with MasterNotRunningException.

Hope this helps.

Regards,
Ed

2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com



  

I am also facing some issues  and i think applying

hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
   would solve my problem.

I try to run Hadoop/Hive/Hbase integration in fully Distributed mode.

But I am facing

Re: When applying a patch, which attachment should I use?

2011-01-20 Thread Adarsh Sharma


Extremely Sorry, Forgot to attach logs :
Here they are :

Adarsh Sharma wrote:

Thanx Edward, Today I look upon your considerations and start working :

edward choi wrote:

Dear Adarsh,

I have a single machine running Namenode/JobTracker/Hbase Master.
There are 17 machines running Datanode/TaskTracker
Among those 17 machines, 14 are running Hbase Regionservers.
The other 3 machines are running Zookeeper.
  


I have 10 servers and a  single machine running 
Namenode/JobTracker/Hbase Master.

There are 9 machines running Datanode/TaskTracker
Among those 9 machines, 6 are running Hbase Regionservers.
The other 3 machines are running Zookeeper.
I'm using hadoop-0.20.2, hbase-0.20.3

And about the Zookeeper,
Hbase comes with its own Zookeeper so you don't need to install a new
Zookeeper. (except for the special occasion, which I'll explain later)
I assigned 14 machines as regionservers using
$HBASE_HOME/conf/regionservers.
I assigned 3 machines as Zookeeperss using hbase.zookeeper.quorum 
property

in $HBASE_HOME/conf/hbase-site.xml.
Don't forget to set export HBASE_MANAGES_ZK=true
  


I think bydefault it takes true anyways I set export 
HBASE_MANAGES_ZK=true in hbase-env.sh



in $HBASE_HOME/conf/hbase-env.sh. (This is where you announce that you
will be using Zookeeper that comes with HBase)
This way, when you execute $HBASE_HOME/bin/start-hbase.sh, HBase will
automatically start Zookeeper first, then start HBase daemons.
  
But perhaps I found my Hbase Master is running  through Web UI. But 
there are exceptions in my Zookeeper Logs. I am also able to create 
table in hbase and view it.


The onle thing I don't do is apply the *hdfs-630-0.20-append.patch* to 
each hadoop package in each node. As I don't know how to apply it.


If this is the problem Please guide me the steps to apply it.

I also attached my Zookeeper Logs of my Zookeeper Servers.
Please find the attachment.

Also, you can install your own Zookeeper and tell HBase to use it 
instead of

its own.
I read it on the internet that Zookeeper that comes with HBase does 
not work

properly on Windows 7 64bit. (
http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/)
So in that case you need to install your own Zookeeper, set it up 
properly,

and tell HBase to use it instead of its own.
All you need to do is configure zoo.cfg and add it to the HBase 
CLASSPATH.

And don't forget to set export HBASE_MANAGES_ZK=false
in $HBASE_HOME/conf/hbase-env.sh.
This way, HBase will not start Zookeeper automatically.

About the separation of Zookeepers from regionservers,
Yes, it is recommended to separate Zookeepers from regionservers.
But that won't be necessary unless your clusters are very heavily 
loaded.
They also suggest that you give Zookeeper its own hard disk. But I 
haven't

done that myself yet. (Hard disks cost money you know)
So I'd say your cluster seems fine.
But when you want to expand your cluster, you'd need some changes. I 
suggest

you take a look at Hadoop: The Definitive Guide.

  


Thanks  Best Regards

Adarsh Sharma


Regards,
Edward

  




2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com

 

Thanks Edward,

Can you describe me the architecture used in your configuration.

Fore.g I have a cluster of 10 servers and

1 node act as ( Namenode, Jobtracker, Hmaster ).
Remainning 9 nodes act as ( Slaves, datanodes, Tasktracker, 
Hregionservers

).
Among these 9 nodes I also set 3 nodes in zookeeper.quorum.property.

I want to know that is it necessary to configure zookeeper 
separately with

the zookeeper-3.2.2 package or just have some IP's listed in

zookeeper.quorum.property and Hbase take care of it.

Can we specify IP's of Hregionservers used before as zookeeper 
servers (

HQuorumPeer ) or we must need separate servers for it.

My problem arises in running zookeeper. My Hbase is up and running  in
fully distributed mode too.




With Best Regards

Adarsh Sharma








edward choi wrote:

   

Dear Adarsh,

My situation is somewhat different from yours as I am only running 
Hadoop

and Hbase (as opposed to Hadoop/Hive/Hbase).

But I hope my experience could be of help to you somehow.

I applied the hdfs-630-0.20-append.patch to every single Hadoop 
node.

(including master and slaves)
Then I followed exactly what they told me to do on

http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description 


.

I didn't get a single error message and successfully started HBase 
in a

fully distributed mode.

I am not using Hive so I can't tell what caused the
MasterNotRunningException, but the patch above is meant to  allow
DFSClients
pass NameNode lists of known dead Datanodes.
I doubt that the patch has anything to do with 
MasterNotRunningException.


Hope this helps.

Regards,
Ed

2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com



 

I am also facing some issues  and i think applying

hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch

Re: No locks available

2011-01-18 Thread Adarsh Sharma


Benjamin Gufler wrote:

Hi,

On 2011-01-17 14:28, Adarsh Sharma wrote:

Edward Capriolo wrote:

No locks available can mean that you are trying to use hadoop on a
filesystem that does not support file level locking. Are you trying to
run your name node storage in NFS space?


Yes Edward U'r absolutely right.
I mount hard disk path to the Datanode ( VM ) dfs.data.path.
But it causes no problem in other nodes.


are the NFS locking services up and running?



Thanx Benjamin ,

I am sorry but I don't know how to check it.

Can you Please tell me how do I find this feature and answer your question.

I think this might be the problem, but the other datanodes doesn't have 
this issue.


Please note that my name node storage (dfs.name.dir) is not in NFS.


Best Regards

Adarsh Sharma

Benjamin

Re: Why Hadoop is slow in Cloud

2011-01-18 Thread Adarsh Sharma


Marc Farnum Rendino wrote:

Virtualization != Emulation

Yes, virtualization does have its own costs (as does running directly
on hardware) - depending on the specifics of both the virtualization
*and* the task at hand.
  

Absolutely right, and for this I perform the initial testing.

I want to know *AT WHAT COSTS  *it comes.
10-15% is tolerable but at this rate, it needs some work.

As Steve rightly suggest , I am in some CPU bound testing work to  know 
the  exact stats.


I let you know after the work.


If my task (in the general sense) is CPU bound, it doesn't matter (to
me) if the virtualization has a disk I/O penalty.
  


But is it possible to perform some tuning in the work-flow of the VM's 
to increase some performance or not.


If on the other hand, my task is limited by a disk I/O penalty, I'll

weigh that into the *total* cost/benefit, and virtualization may not -
or may still - be an advantageous choice.

  

Some reasons of slowness will highly helpful. Any guidance is appreciable.


Context is king.

  

Thanks  best Regards

Adarsh Sharma


On Mon, Jan 17, 2011 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
  

Everything you emulate you cut X% performance right off the top...

No locks available

2011-01-17 Thread Adarsh Sharma


Dear all,


I know this a silly mistake but not able to find the reason of the 
exception that causes one datanode to fail to start.


I mount  /hdd2-1 of a phsical machine into this VM and start 
datanode,tasktracker.


Datanode fails after few seconds.

Can someone tell  me  the root  cause.

Below is the exception :

2011-01-17 18:01:08,199 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop7/172.16.1.8
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

/
2011-01-17 18:03:36,391 INFO 
org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)


2011-01-17 18:03:36,393 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
~/project/hadoop-0.20.2/logs/hadoop-hadoop-datanode-hadoop7.log 42L, 
3141C   1,1   Top





Thanks

Adarsh

Re: No locks available

2011-01-17 Thread Adarsh Sharma


xiufeng liu wrote:

did you format the namenode before you start? try to format it and start:
1) go to HADOOP_HOME/bin

2) ./hadoop namenode -format
  


I format the namenode and then issue the command :

bin/start-all.sh

this results 2 of my datanodes to run properly but causes the below 
exception for one datanode.



Can i know why it occurs.


Thanx




On Mon, Jan 17, 2011 at 1:43 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Dear all,


I know this a silly mistake but not able to find the reason of the
exception that causes one datanode to fail to start.

I mount  /hdd2-1 of a phsical machine into this VM and start
datanode,tasktracker.

Datanode fails after few seconds.

Can someone tell  me  the root  cause.

Below is the exception :

2011-01-17 18:01:08,199 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop7/172.16.1.8
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
2011-01-17 18:03:36,391 INFO org.apache.hadoop.hdfs.server.common.Storage:
java.io.IOException: No locks available
  at sun.nio.ch.FileChannelImpl.lock0(Native Method)
  at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
  at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
  at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

2011-01-17 18:03:36,393 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: No
locks available
  at sun.nio.ch.FileChannelImpl.lock0(Native Method)
  at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
  at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
  at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
  at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
~/project/hadoop-0.20.2/logs/hadoop-hadoop-datanode-hadoop7.log 42L,
3141C   1,1   Top




Thanks

Adarsh

Re: No locks available

2011-01-17 Thread Adarsh Sharma


Harsh J wrote:

Could you re-check your permissions on the $(dfs.data.dir)s for your
failing DataNode versus the user that runs it?

On Mon, Jan 17, 2011 at 6:33 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Can i know why it occurs.



  
Thanx Harsh , I know this issue and I cross-check several times 
permissions of of all dirs ( dfs.name.dir, dfs.data.dir, mapred.local.dir ).


It is 755 and is owned by hadoop user and group.

I found that in failed datanode dir , it is unable to create 5 files in 
dfs.data.dir whereas on the other hand, it creates following files in 
successsful datanode :


curent
tmp
storage
in_use.lock

Does it helps.

Thanx

Re: When applying a patch, which attachment should I use?

2011-01-17 Thread Adarsh Sharma


Thanx a Lot Edward,

This information is very helpful to me.



With Best Regards

Adarsh Sharma




edward choi wrote:

Dear Adarsh,

I have a single machine running Namenode/JobTracker/Hbase Master.
There are 17 machines running Datanode/TaskTracker
Among those 17 machines, 14 are running Hbase Regionservers.
The other 3 machines are running Zookeeper.

And about the Zookeeper,
Hbase comes with its own Zookeeper so you don't need to install a new
Zookeeper. (except for the special occasion, which I'll explain later)
I assigned 14 machines as regionservers using
$HBASE_HOME/conf/regionservers.
I assigned 3 machines as Zookeeperss using hbase.zookeeper.quorum property
in $HBASE_HOME/conf/hbase-site.xml.
Don't forget to set export HBASE_MANAGES_ZK=true
in $HBASE_HOME/conf/hbase-env.sh. (This is where you announce that you
will be using Zookeeper that comes with HBase)
This way, when you execute $HBASE_HOME/bin/start-hbase.sh, HBase will
automatically start Zookeeper first, then start HBase daemons.

Also, you can install your own Zookeeper and tell HBase to use it instead of
its own.
I read it on the internet that Zookeeper that comes with HBase does not work
properly on Windows 7 64bit. (
http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/)
So in that case you need to install your own Zookeeper, set it up properly,
and tell HBase to use it instead of its own.
All you need to do is configure zoo.cfg and add it to the HBase CLASSPATH.
And don't forget to set export HBASE_MANAGES_ZK=false
in $HBASE_HOME/conf/hbase-env.sh.
This way, HBase will not start Zookeeper automatically.

About the separation of Zookeepers from regionservers,
Yes, it is recommended to separate Zookeepers from regionservers.
But that won't be necessary unless your clusters are very heavily loaded.
They also suggest that you give Zookeeper its own hard disk. But I haven't
done that myself yet. (Hard disks cost money you know)
So I'd say your cluster seems fine.
But when you want to expand your cluster, you'd need some changes. I suggest
you take a look at Hadoop: The Definitive Guide.

Regards,
Edward

2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com

  

Thanks Edward,

Can you describe me the architecture used in your configuration.

Fore.g I have a cluster of 10 servers and

1 node act as ( Namenode, Jobtracker, Hmaster ).
Remainning 9 nodes act as ( Slaves, datanodes, Tasktracker, Hregionservers
).
Among these 9 nodes I also set 3 nodes in zookeeper.quorum.property.

I want to know that is it necessary to configure zookeeper separately with
the zookeeper-3.2.2 package or just have some IP's listed in

zookeeper.quorum.property and Hbase take care of it.

Can we specify IP's of Hregionservers used before as zookeeper servers (
HQuorumPeer ) or we must need separate servers for it.

My problem arises in running zookeeper. My Hbase is up and running  in
fully distributed mode too.




With Best Regards

Adarsh Sharma








edward choi wrote:



Dear Adarsh,

My situation is somewhat different from yours as I am only running Hadoop
and Hbase (as opposed to Hadoop/Hive/Hbase).

But I hope my experience could be of help to you somehow.

I applied the hdfs-630-0.20-append.patch to every single Hadoop node.
(including master and slaves)
Then I followed exactly what they told me to do on

http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description
.

I didn't get a single error message and successfully started HBase in a
fully distributed mode.

I am not using Hive so I can't tell what caused the
MasterNotRunningException, but the patch above is meant to  allow
DFSClients
pass NameNode lists of known dead Datanodes.
I doubt that the patch has anything to do with MasterNotRunningException.

Hope this helps.

Regards,
Ed

2011/1/13 Adarsh Sharma adarsh.sha...@orkash.com



  

I am also facing some issues  and i think applying

hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
   would solve my problem.

I try to run Hadoop/Hive/Hbase integration in fully Distributed mode.

But I am facing master Not Running Exception mentioned in

http://wiki.apache.org/hadoop/Hive/HBaseIntegration.

My Hadoop Version= 0.20.2, Hive =0.6.0 , Hbase=0.20.6.

What you think Edward.


Thanks  Adarsh






edward choi wrote:





I am not familiar with this whole svn and patch stuff, so please
understand
my asking.

I was going to apply
hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
 only
because I wanted to install HBase and the installation guide told me to.
The append branch you mentioned, does that include
hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
 as
well?
Is it like the latest patch with all the good stuff packed in one?

Regards,
Ed

2011/1/12 Ted Dunning tdunn...@maprtech.com





  

You may

Re: No locks available

2011-01-17 Thread Adarsh Sharma


Edward Capriolo wrote:

On Mon, Jan 17, 2011 at 8:13 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Harsh J wrote:


Could you re-check your permissions on the $(dfs.data.dir)s for your
failing DataNode versus the user that runs it?

On Mon, Jan 17, 2011 at 6:33 PM, Adarsh Sharma adarsh.sha...@orkash.com
wrote:

  

Can i know why it occurs.


  

Thanx Harsh , I know this issue and I cross-check several times permissions
of of all dirs ( dfs.name.dir, dfs.data.dir, mapred.local.dir ).

It is 755 and is owned by hadoop user and group.

I found that in failed datanode dir , it is unable to create 5 files in
dfs.data.dir whereas on the other hand, it creates following files in
successsful datanode :

curent
tmp
storage
in_use.lock

Does it helps.

Thanx




No locks available can mean that you are trying to use hadoop on a
filesystem that does not support file level locking. Are you trying to
run your name node storage in NFS space?
  


Yes Edward U'r absolutely right.

I mount hard disk path to the Datanode ( VM ) dfs.data.path.

But it causes no problem in other nodes.

Thanx

Re: No locks available

2011-01-17 Thread Adarsh Sharma


Edward Capriolo wrote:

On Mon, Jan 17, 2011 at 8:13 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
  

Harsh J wrote:


Could you re-check your permissions on the $(dfs.data.dir)s for your
failing DataNode versus the user that runs it?

On Mon, Jan 17, 2011 at 6:33 PM, Adarsh Sharma adarsh.sha...@orkash.com
wrote:

  

Can i know why it occurs.


  

Thanx Harsh , I know this issue and I cross-check several times permissions
of of all dirs ( dfs.name.dir, dfs.data.dir, mapred.local.dir ).

It is 755 and is owned by hadoop user and group.

I found that in failed datanode dir , it is unable to create 5 files in
dfs.data.dir whereas on the other hand, it creates following files in
successsful datanode :

curent
tmp
storage
in_use.lock

Does it helps.

Thanx




No locks available can mean that you are trying to use hadoop on a
filesystem that does not support file level locking. Are you trying to
run your name node storage in NFS space?
  

I am sorry but my Namenode is in separate Machine outside CLoud.

The path is in /home/hadoop/project/hadoop-0.20.2/name

It's is running properly.

I find it difficult because I followed the same steps in the other 2 
VM's and they are running.


How could I debug this for 1 exceptional case where it is failing.


Thanks  Regards

Adarsh Sharma

Why Hadoop is slow in Cloud

2011-01-16 Thread Adarsh Sharma


Dear all,

Yesterday I performed a kind of testing between *Hadoop in Standalone 
Servers*  *Hadoop in Cloud.


*I establish a Hadoop cluster of 4 nodes ( Standalone Machines ) in 
which one node act as Master ( Namenode , Jobtracker ) and the remaining 
nodes act as slaves ( Datanodes, Tasktracker ).
On the other hand, for testing Hadoop in *Cloud* ( Euclayptus ), I made 
one Standalone Machine as *Hadoop Master* and the slaves are configured 
on the VM's in Cloud.


I am confused about the stats obtained after the testing. What I 
concluded that the VM are giving half peformance as compared with 
Standalone Servers.


I am expected some slow down but at this level I never expect. Would 
this is genuine or there may be some configuration problem.


I am using 1 GB (10-1000mb/s) LAN in VM machines and 100mb/s in 
Standalone Servers.


Please have a look on the results and if interested comment on it.



Thanks  Regards

Adarsh Sharma


hadoop_testing_new.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Re: When applying a patch, which attachment should I use?

2011-01-12 Thread Adarsh Sharma

I am also facing some issues and i think applying
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch

would solve my problem.

I try to run Hadoop/Hive/Hbase integration in fully Distributed mode.

But I am facing master Not Running Exception mentioned in

http://wiki.apache.org/hadoop/Hive/HBaseIntegration.

My Hadoop Version= 0.20.2, Hive =0.6.0 , Hbase=0.20.6.

What you think Edward.

Thanks
Adarsh

edward choi wrote:

I am not familiar with this whole svn and patch stuff, so please understand
my asking.

I was going to apply
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
only
because I wanted to install HBase and the installation guide told me to.
The append branch you mentioned, does that include
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
as
well?
Is it like the latest patch with all the good stuff packed in one?

Regards,
Ed

2011/1/12 Ted Dunning tdunn...@maprtech.com

You may also be interested in the append branch:

http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/

On Tue, Jan 11, 2011 at 3:12 AM, edward choi mp2...@gmail.com wrote:

Thanks for the info.
I am currently using Hadoop 0.20.2, so I guess I only need apply
hdfs-630-0.20-append.patch

https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch

.
I wasn't familiar with the term trunk. I guess it means the latest
development.
Thanks again.

Best Regards,
Ed

2011/1/11 Konstantin Boudnik c...@apache.org

Yeah, that's pretty crazy all right. In your case looks like that 3
patches on the top are the latest for 0.20-append branch, 0.21 branch
and trunk (which perhaps 0.22 branch at the moment). It doesn't look
like you need to apply all of them - just try the latest for your
particular branch.

The mess is caused by the fact the ppl are using different names for
consequent patches (as in file.1.patch, file.2.patch etc) This is
_very_ confusing indeed, especially when different contributors work
on the same fix/feature.
--
Take care,
Konstantin (Cos) Boudnik

On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote:

Hi,
For the first time I am about to apply a patch to HDFS.

https://issues.apache.org/jira/browse/HDFS-630

Above is the one that I am trying to do.
But there are like 15 patches and I don't know which one to use.

Could anyone tell me if I need to apply them all or just the one at

the

top?

The whole patching process is just so confusing :-(

No locks available

2011-01-11 Thread Adarsh Sharma


Dear all,

Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, 
perform some jobs ). But today when I start my cluster I came across a 
problem on one of my datanodes.


Datanodes fails to start due to following error :-


2011-01-11 12:54:10,367 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop3/172.16.1.4
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

/
2011-01-11 12:55:57,031 INFO 
org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)


2011-01-11 12:55:57,043 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
hadoop-hadoop-datanode-hadoop3.log 1775L, 
210569C
1,1   Top



Can Please is familiar with this issue. Please help.


Thanks  Regards

Adarsh Sharma

Re: No locks available

2011-01-11 Thread Adarsh Sharma


Allen Wittenauer wrote:

On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote:

  

Dear all,

Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform 
some jobs ). But today when I start my cluster I came across a problem on one 
of my datanodes.



Are you running this on NFS?
  

No Sir,

I am running this on 3 Servers with local filesystem. Each Server 
contains 2 Hard Disks ( /hdd2-1, /hdd1-1 ) and on each servers there run 
2 VM's and one occupy /hdd2-1 and the other /hdd1-1.


My Namenode contains all the predefined Ip of VM's.


Thanks
  

2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: 
java.io.IOException: No locks available

Re: Too-many fetch failure Reduce Error

2011-01-11 Thread Adarsh Sharma



Any update on this error.


Thanks



Adarsh Sharma wrote:

Esteban Gutierrez Moguel wrote:

Adarsh,

Dou you have in /etc/hosts the hostnames for masters and slaves?
  


Yes I know this issue. But did you think the error occurs while 
reading the output of map.

I want to know the proper reason of below lines :

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201101071129_0001/attempt_201101071129_0001_m_12_0/output/file.out.index 





esteban.

On Fri, Jan 7, 2011 at 06:47, Adarsh Sharma 
adarsh.sha...@orkash.comwrote:


 

Dear all,

I am researching about the below error and could not able to find the
reason :

Data Size : 3.4 GB
Hadoop-0.20.0

had...@ws32-test-lin:~/project/hadoop-0.20.2$ bin/hadoop jar
hadoop-0.20.2-examples.jar wordcount /user/hadoop/page_content.txt
page_content_output.txt
11/01/07 16:11:14 INFO input.FileInputFormat: Total input paths to 
process

: 1
11/01/07 16:11:15 INFO mapred.JobClient: Running job: 
job_201101071129_0001

11/01/07 16:11:16 INFO mapred.JobClient:  map 0% reduce 0%
11/01/07 16:11:41 INFO mapred.JobClient:  map 1% reduce 0%
11/01/07 16:11:45 INFO mapred.JobClient:  map 2% reduce 0%
11/01/07 16:11:48 INFO mapred.JobClient:  map 3% reduce 0%
11/01/07 16:11:52 INFO mapred.JobClient:  map 4% reduce 0%
11/01/07 16:11:56 INFO mapred.JobClient:  map 5% reduce 0%
11/01/07 16:12:00 INFO mapred.JobClient:  map 6% reduce 0%
11/01/07 16:12:05 INFO mapred.JobClient:  map 7% reduce 0%
11/01/07 16:12:08 INFO mapred.JobClient:  map 8% reduce 0%
11/01/07 16:12:11 INFO mapred.JobClient:  map 9% reduce 0%
11/01/07 16:12:14 INFO mapred.JobClient:  map 10% reduce 0%
11/01/07 16:12:17 INFO mapred.JobClient:  map 11% reduce 0%
11/01/07 16:12:21 INFO mapred.JobClient:  map 12% reduce 0%
11/01/07 16:12:24 INFO mapred.JobClient:  map 13% reduce 0%
11/01/07 16:12:27 INFO mapred.JobClient:  map 14% reduce 0%
11/01/07 16:12:30 INFO mapred.JobClient:  map 15% reduce 0%
11/01/07 16:12:33 INFO mapred.JobClient:  map 16% reduce 0%
11/01/07 16:12:36 INFO mapred.JobClient:  map 17% reduce 0%
11/01/07 16:12:40 INFO mapred.JobClient:  map 18% reduce 0%
11/01/07 16:12:45 INFO mapred.JobClient:  map 19% reduce 0%
11/01/07 16:12:48 INFO mapred.JobClient:  map 20% reduce 0%
11/01/07 16:12:54 INFO mapred.JobClient:  map 21% reduce 0%
11/01/07 16:13:00 INFO mapred.JobClient:  map 22% reduce 0%
11/01/07 16:13:04 INFO mapred.JobClient:  map 22% reduce 1%
11/01/07 16:13:13 INFO mapred.JobClient:  map 23% reduce 1%
11/01/07 16:13:19 INFO mapred.JobClient:  map 24% reduce 1%
11/01/07 16:13:25 INFO mapred.JobClient:  map 25% reduce 1%
11/01/07 16:13:30 INFO mapred.JobClient:  map 26% reduce 1%
11/01/07 16:13:34 INFO mapred.JobClient:  map 26% reduce 3%
11/01/07 16:13:36 INFO mapred.JobClient:  map 27% reduce 3%
11/01/07 16:13:37 INFO mapred.JobClient:  map 27% reduce 4%
11/01/07 16:13:39 INFO mapred.JobClient:  map 28% reduce 4%
11/01/07 16:13:43 INFO mapred.JobClient:  map 29% reduce 4%
11/01/07 16:13:46 INFO mapred.JobClient:  map 30% reduce 4%
11/01/07 16:13:49 INFO mapred.JobClient:  map 31% reduce 4%
11/01/07 16:13:52 INFO mapred.JobClient:  map 32% reduce 4%
11/01/07 16:13:55 INFO mapred.JobClient:  map 33% reduce 4%
11/01/07 16:13:58 INFO mapred.JobClient:  map 34% reduce 4%
11/01/07 16:14:02 INFO mapred.JobClient:  map 35% reduce 4%
11/01/07 16:14:05 INFO mapred.JobClient:  map 36% reduce 4%
11/01/07 16:14:08 INFO mapred.JobClient:  map 37% reduce 4%
11/01/07 16:14:11 INFO mapred.JobClient:  map 38% reduce 4%
11/01/07 16:14:15 INFO mapred.JobClient:  map 39% reduce 4%
11/01/07 16:14:19 INFO mapred.JobClient:  map 40% reduce 4%
11/01/07 16:14:20 INFO mapred.JobClient:  map 40% reduce 5%
11/01/07 16:14:25 INFO mapred.JobClient:  map 41% reduce 5%
11/01/07 16:14:32 INFO mapred.JobClient:  map 42% reduce 5%
11/01/07 16:14:38 INFO mapred.JobClient:  map 43% reduce 5%
11/01/07 16:14:41 INFO mapred.JobClient:  map 43% reduce 6%
11/01/07 16:14:43 INFO mapred.JobClient:  map 44% reduce 6%
11/01/07 16:14:47 INFO mapred.JobClient:  map 45% reduce 6%
11/01/07 16:14:50 INFO mapred.JobClient:  map 46% reduce 6%
11/01/07 16:14:54 INFO mapred.JobClient:  map 47% reduce 7%
11/01/07 16:14:59 INFO mapred.JobClient:  map 48% reduce 7%
11/01/07 16:15:02 INFO mapred.JobClient:  map 49% reduce 7%
11/01/07 16:15:05 INFO mapred.JobClient:  map 50% reduce 7%
11/01/07 16:15:11 INFO mapred.JobClient:  map 51% reduce 7%
11/01/07 16:15:14 INFO mapred.JobClient:  map 52% reduce 7%
11/01/07 16:15:16 INFO mapred.JobClient:  map 52% reduce 8%
11/01/07 16:15:20 INFO mapred.JobClient:  map 53% reduce 8%
11/01/07 16:15:25 INFO mapred.JobClient:  map 54% reduce 8%
11/01/07 16:15:29 INFO mapred.JobClient:  map 55% reduce 8%
11/01/07 16:15:31 INFO mapred.JobClient:  map 55% reduce 9%
11/01/07 16:15:33 INFO mapred.JobClient:  map 56% reduce 9%
11/01/07 16:15:38 INFO mapred.JobClient:  map 57% reduce 9%
11/01/07 16:15:42 INFO mapred.JobClient:  map 58% reduce 9%
11/01/07 16:15:43 INFO

Re: TeraSort question.

2011-01-10 Thread Adarsh Sharma

If possible Please also post your configuration parameters like 
*dfs.data.dir* , *mapred.local.dir* , map and reduce parmeters, java  etc.



Thanks

bharath vissapragada wrote:

Ravi,

Please post the figures and graphs .. Figures for  large clusters (
200 nodes) are certainly interesting ..

Thanks

On Tue, Jan 11, 2011 at 10:36 AM, Raj V rajv...@yahoo.com wrote:
  

All,

I have been running terasort on a 480 node hadoop cluster. I have also 
collected cpu,memory,disk, network statistics during this run. The system stats 
are quite intersting. I can post it when I have put them together in some 
presentable format ( if there is interest.). However while looking at the data, 
I noticed something interesting.

 I thought, intutively, that the all the systems in the cluster would have more 
or less similar behaviour ( time translation was possible) but the overall 
graph would look the same.,

Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network 
etc. activity when the sort was running. Strangeley enough, it was not so., Two 
of the 5 systems were seriously busy, big IO with lots of disk and network 
activity. The other three systems, CPU was more or less 100% idle, slight 
network and I/O.

Is that normal and/or expected? SHouldn't all the nodes be utilized in more or 
less manner over the length of the run?

I generated the data forf the sort using teragen. ( 128MB bloick size, 
replication =3).

I would also be interested in other people timings of sort. Is there some place 
where people can post sort numbers ( not just the record.)

I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( 
Some logistical issues abt. posting them tonight)

I am using CDH3B3, even though I think this is not specific to CDH3B3.

Sorry for the cross post.

Raj

Re: Too-many fetch failure Reduce Error

2011-01-09 Thread Adarsh Sharma


Esteban Gutierrez Moguel wrote:

Adarsh,

Dou you have in /etc/hosts the hostnames for masters and slaves?
  


Yes I know this issue. But did you think the error occurs while reading 
the output of map.

I want to know the proper reason of below lines :

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201101071129_0001/attempt_201101071129_0001_m_12_0/output/file.out.index



esteban.

On Fri, Jan 7, 2011 at 06:47, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Dear all,

I am researching about the below error and could not able to find the
reason :

Data Size : 3.4 GB
Hadoop-0.20.0

had...@ws32-test-lin:~/project/hadoop-0.20.2$ bin/hadoop jar
hadoop-0.20.2-examples.jar wordcount /user/hadoop/page_content.txt
page_content_output.txt
11/01/07 16:11:14 INFO input.FileInputFormat: Total input paths to process
: 1
11/01/07 16:11:15 INFO mapred.JobClient: Running job: job_201101071129_0001
11/01/07 16:11:16 INFO mapred.JobClient:  map 0% reduce 0%
11/01/07 16:11:41 INFO mapred.JobClient:  map 1% reduce 0%
11/01/07 16:11:45 INFO mapred.JobClient:  map 2% reduce 0%
11/01/07 16:11:48 INFO mapred.JobClient:  map 3% reduce 0%
11/01/07 16:11:52 INFO mapred.JobClient:  map 4% reduce 0%
11/01/07 16:11:56 INFO mapred.JobClient:  map 5% reduce 0%
11/01/07 16:12:00 INFO mapred.JobClient:  map 6% reduce 0%
11/01/07 16:12:05 INFO mapred.JobClient:  map 7% reduce 0%
11/01/07 16:12:08 INFO mapred.JobClient:  map 8% reduce 0%
11/01/07 16:12:11 INFO mapred.JobClient:  map 9% reduce 0%
11/01/07 16:12:14 INFO mapred.JobClient:  map 10% reduce 0%
11/01/07 16:12:17 INFO mapred.JobClient:  map 11% reduce 0%
11/01/07 16:12:21 INFO mapred.JobClient:  map 12% reduce 0%
11/01/07 16:12:24 INFO mapred.JobClient:  map 13% reduce 0%
11/01/07 16:12:27 INFO mapred.JobClient:  map 14% reduce 0%
11/01/07 16:12:30 INFO mapred.JobClient:  map 15% reduce 0%
11/01/07 16:12:33 INFO mapred.JobClient:  map 16% reduce 0%
11/01/07 16:12:36 INFO mapred.JobClient:  map 17% reduce 0%
11/01/07 16:12:40 INFO mapred.JobClient:  map 18% reduce 0%
11/01/07 16:12:45 INFO mapred.JobClient:  map 19% reduce 0%
11/01/07 16:12:48 INFO mapred.JobClient:  map 20% reduce 0%
11/01/07 16:12:54 INFO mapred.JobClient:  map 21% reduce 0%
11/01/07 16:13:00 INFO mapred.JobClient:  map 22% reduce 0%
11/01/07 16:13:04 INFO mapred.JobClient:  map 22% reduce 1%
11/01/07 16:13:13 INFO mapred.JobClient:  map 23% reduce 1%
11/01/07 16:13:19 INFO mapred.JobClient:  map 24% reduce 1%
11/01/07 16:13:25 INFO mapred.JobClient:  map 25% reduce 1%
11/01/07 16:13:30 INFO mapred.JobClient:  map 26% reduce 1%
11/01/07 16:13:34 INFO mapred.JobClient:  map 26% reduce 3%
11/01/07 16:13:36 INFO mapred.JobClient:  map 27% reduce 3%
11/01/07 16:13:37 INFO mapred.JobClient:  map 27% reduce 4%
11/01/07 16:13:39 INFO mapred.JobClient:  map 28% reduce 4%
11/01/07 16:13:43 INFO mapred.JobClient:  map 29% reduce 4%
11/01/07 16:13:46 INFO mapred.JobClient:  map 30% reduce 4%
11/01/07 16:13:49 INFO mapred.JobClient:  map 31% reduce 4%
11/01/07 16:13:52 INFO mapred.JobClient:  map 32% reduce 4%
11/01/07 16:13:55 INFO mapred.JobClient:  map 33% reduce 4%
11/01/07 16:13:58 INFO mapred.JobClient:  map 34% reduce 4%
11/01/07 16:14:02 INFO mapred.JobClient:  map 35% reduce 4%
11/01/07 16:14:05 INFO mapred.JobClient:  map 36% reduce 4%
11/01/07 16:14:08 INFO mapred.JobClient:  map 37% reduce 4%
11/01/07 16:14:11 INFO mapred.JobClient:  map 38% reduce 4%
11/01/07 16:14:15 INFO mapred.JobClient:  map 39% reduce 4%
11/01/07 16:14:19 INFO mapred.JobClient:  map 40% reduce 4%
11/01/07 16:14:20 INFO mapred.JobClient:  map 40% reduce 5%
11/01/07 16:14:25 INFO mapred.JobClient:  map 41% reduce 5%
11/01/07 16:14:32 INFO mapred.JobClient:  map 42% reduce 5%
11/01/07 16:14:38 INFO mapred.JobClient:  map 43% reduce 5%
11/01/07 16:14:41 INFO mapred.JobClient:  map 43% reduce 6%
11/01/07 16:14:43 INFO mapred.JobClient:  map 44% reduce 6%
11/01/07 16:14:47 INFO mapred.JobClient:  map 45% reduce 6%
11/01/07 16:14:50 INFO mapred.JobClient:  map 46% reduce 6%
11/01/07 16:14:54 INFO mapred.JobClient:  map 47% reduce 7%
11/01/07 16:14:59 INFO mapred.JobClient:  map 48% reduce 7%
11/01/07 16:15:02 INFO mapred.JobClient:  map 49% reduce 7%
11/01/07 16:15:05 INFO mapred.JobClient:  map 50% reduce 7%
11/01/07 16:15:11 INFO mapred.JobClient:  map 51% reduce 7%
11/01/07 16:15:14 INFO mapred.JobClient:  map 52% reduce 7%
11/01/07 16:15:16 INFO mapred.JobClient:  map 52% reduce 8%
11/01/07 16:15:20 INFO mapred.JobClient:  map 53% reduce 8%
11/01/07 16:15:25 INFO mapred.JobClient:  map 54% reduce 8%
11/01/07 16:15:29 INFO mapred.JobClient:  map 55% reduce 8%
11/01/07 16:15:31 INFO mapred.JobClient:  map 55% reduce 9%
11/01/07 16:15:33 INFO mapred.JobClient:  map 56% reduce 9%
11/01/07 16:15:38 INFO mapred.JobClient:  map 57% reduce 9%
11/01/07 16:15:42 INFO mapred.JobClient:  map 58% reduce 9%
11/01/07 16:15:43 INFO mapred.JobClient:  map 58% reduce 10%
11/01/07 16:15:46 INFO

Re: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Datanode state: LV = -19 CTime = 1294051643891 is newer than the namespace state: LV = -19 CTime = 0

2011-01-09 Thread Adarsh Sharma


Shuja Rehman wrote:

hi

i have format the name node and now when i restart the cluster, i am getting
the strange error. kindly let me know how to fix it.
thnx

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop.zoniversal.com/10.0.3.85
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2+737
STARTUP_MSG:   build =  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd;
compiled by 'root' on Mon Oct 11 13:14:05 EDT 2010
/
2011-01-08 12:55:58,586 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 0 time(s).
2011-01-08 12:55:59,598 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 1 time(s).
2011-01-08 12:56:00,608 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 2 time(s).
2011-01-08 12:56:01,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 3 time(s).
2011-01-08 12:56:03,540 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Datanode state: LV = -19 CTime = 1294051643891 is newer than the namespace
state: LV = -19 CTime = 0
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:249)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:272)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1492)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1432)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1450)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1575)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1585)

2011-01-08 12:56:03,541 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at hadoop.zoniversal.com/10.0.3.85
/
2011-01-08 13:04:17,579 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop.zoniversal.com/10.0.3.85
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2+737
STARTUP_MSG:   build =  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd;
compiled by 'root' on Mon Oct 11 13:14:05 EDT 2010
/
2011-01-08 13:04:19,028 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 0 time(s).
2011-01-08 13:04:20,038 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 1 time(s).
2011-01-08 13:04:21,049 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 2 time(s).
2011-01-08 13:04:22,060 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.0.3.85:8020. Already tried 3 time(s).
2011-01-08 13:04:24,601 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Incompatible namespaceIDs in /var/lib/hadoop-0.20/cache/hdfs/dfs/data:
namenode namespaceID = 125812142; datanode namespaceID = 1083940884
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:272)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1492)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1432)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1450)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1575)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1585)

2011-01-08 13:04:24,602 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at hadoop.zoniversal.com/10.0.3.85


  

Manually delete

/var/lib/hadoop-0.20/cache/hdfs/dfs/data

diretcory of all nodes and then format and start the cluster.

This error occurs due to incompatibily in the metadata.


Best Regards

Adarsh Sharma

Too-many fetch failure Reduce Error

2011-01-07 Thread Adarsh Sharma

)

   at org.mortbay.jetty.Server.handle(Server.java:324)
   at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
   at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)

   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
   at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
   at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
  



Let's have some discussion.


Thanks  Regards

Adarsh Sharma

Hive/Hbase Integration Error

2011-01-06 Thread Adarsh Sharma

)

  at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)

  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)

2011-01-05 15:20:12,621 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(967)) - 
Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@799dbc3b


 Please help me, as i am not able to solve this problem.

Also I want to add one more thing that my hadoop Cluster is of 9 nodes 
and 8 nodes act as Datanodes,Tasktrackers and Regionservers.


Among these nodes is set zookeeper.quorum.property to have 5 
Datanodes. I don't know the number of servers needed for Zookeeper in 
fully distributed mode.



Best Regards

Adarsh Sharma

Re: How to Achieve TaskTracker Decommission

2011-01-06 Thread Adarsh Sharma


sandeep wrote:
 

Hi 

 


Can any one  you let me know what command do I need to execute  for
Decommissioning TaskTracker?

 


Datanode decommissioning I have achieved using hadoop dfsadmin
-refreshNodes. 

 


Similar to HDFS is there any command for Mapreduce Decommission.

 


I have gone through the defect [HADOOP-5643 ] ..but I was unable to find
what command I need to execute.

 


I have tried ./mapred jobtracker -decommission  but not working .

 
  


There is no command  in HDFS like this .

U must specify the hosts to exclude in mapred.hosts.exclude parameter in 
mapred.site.xml

And

hadoop dfsadmin -refreshNodes. 



is sufficient to do the work.



Please help me .

 


Thanks

sandeep


***
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

Re-Master Not Running Exception ( Hive/Hbase Integration )

2011-01-05 Thread Adarsh Sharma


From that wiki page:


If you are not using hbase-0.20.3, you will need to rebuild the handler with the 
HBase jar matching your version, and change the --auxpath above accordingly. Failure to 
use matching versions will lead to misleading connection failures such as 
MasterNotRunningException since the HBase RPC protocol changes often.


Thanks I think this is the correct problem.

So I configure hbase-0.20.3 to be comapatible with hive-0.6.0 .

But the problem remains the same. I am able to create tables in Hive and Hbase 
but when i try to create Hive/Hbase integrated table , it throws the below 
exception. I checked the size of hive_hbase_handler.jar, hbase-0.20.3.jar , 
hbase-0.20.3.test.jar. It's same.

Please help.


Best Regards

Adarsh Sharma


JVS

On Dec 29, 2010, at 5:20 AM, Adarsh Sharma wrote:



Dear all,

I am following all wiki tutorial for configurin Hive with Hbase Integrated.
For this I use :

Hadoop-0.20.2
Hive-0.6.0 ( By Default Metastore derby)
Hbase-0.20.6
Java 1.6_20

I established successfully a Hadoop Cluster of 4 servers and run Hbase on them. 
I check all Web UI's ( Namenode , Jobtracker , Hbase Master, RegionServers ) 
etc. All is working fine.

But when I issued the below command for creating Hive/Hbase created table I got 
this exception :

had...@s2-ratw-1:~/project/hive-0.6.0$ bin/hive --auxpath 
/home/hadoop/project/hive-0.6.0/lib/hive_hbase-handler.jar,/home/hadoop/project/hive-0.6.0/lib/hbase-0.20.3.jar,/home/hadoop/project/hive-0.6.0/lib/zookeeper-3.2.2.jar
  -hiveconf hbase.zookeeper.quorum=192.168.1.103,192.168.1.114,192.168.1.115
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201012291831_937563117.txt
hive show tables;
OK
Time taken: 3.445 seconds
hive CREATE TABLE hive_hbasetable_k(key int, value string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val)
   TBLPROPERTIES (hbase.table.name = hivehbasek);
FAILED: Error in metadata: 
MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException
  at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:374)
  at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:72)
  at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:64)
  at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:159)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:275)
  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:394)
  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2126)
  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:166)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
  at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
hive

Re: Re-Master Not Running Exception ( Hive/Hbase Integration )

2011-01-05 Thread Adarsh Sharma


Adarsh Sharma wrote:

From that wiki page:

If you are not using hbase-0.20.3, you will need to rebuild the 
handler with the HBase jar matching your version, and change the 
--auxpath above accordingly. Failure to use matching versions will 
lead to misleading connection failures such as 
MasterNotRunningException since the HBase RPC protocol changes often.



Thanks I think this is the correct problem.

So I configure hbase-0.20.3 to be comapatible with hive-0.6.0 .

But the problem remains the same. I am able to create tables in Hive 
and Hbase but when i try to create Hive/Hbase integrated table , it 
throws the below exception. I checked the size of 
hive_hbase_handler.jar, hbase-0.20.3.jar , hbase-0.20.3.test.jar. It's 
same.



hive CREATE TABLE hive_hbasetable_k(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val)
TBLPROPERTIES (hbase.table.name = hivehbasek);
FAILED: Error in metadata: 
MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException
   at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:374)
   at 
org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:72)
   at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:64)
   at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:159)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:275)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:394)
   at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2126)

   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:166)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)

   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
   at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)

   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask







Please help.


Best Regards

Adarsh Sharma


JVS

On Dec 29, 2010, at 5:20 AM, Adarsh Sharma wrote:



Dear all,

I am following all wiki tutorial for configurin Hive with Hbase 
Integrated.

For this I use :

Hadoop-0.20.2
Hive-0.6.0 ( By Default Metastore derby)
Hbase-0.20.6
Java 1.6_20

I established successfully a Hadoop Cluster of 4 servers and run 
Hbase on them. I check all Web UI's ( Namenode , Jobtracker , Hbase 
Master, RegionServers ) etc. All is working fine.


But when I issued the below command for creating Hive/Hbase created 
table I got this exception :


had...@s2-ratw-1:~/project/hive-0.6.0$ bin/hive --auxpath 
/home/hadoop/project/hive-0.6.0/lib/hive_hbase-handler.jar,/home/hadoop/project/hive-0.6.0/lib/hbase-0.20.3.jar,/home/hadoop/project/hive-0.6.0/lib/zookeeper-3.2.2.jar  
-hiveconf 
hbase.zookeeper.quorum=192.168.1.103,192.168.1.114,192.168.1.115
Hive history 
file=/tmp/hadoop/hive_job_log_hadoop_201012291831_937563117.txt

hive show tables;
OK
Time taken: 3.445 seconds
hive CREATE TABLE hive_hbasetable_k(key int, value string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val)
   TBLPROPERTIES (hbase.table.name = hivehbasek);
FAILED: Error in metadata: 
MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException
  at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:374) 

  at 
org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:72)
  at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:64) 

  at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:159) 

  at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:275) 

  at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:394)
  at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2126)
  at 
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:166

Error in metadata: javax.jdo.JDOFatalDataStoreException

2011-01-05 Thread Adarsh Sharma






Dear all,

I am trying Hive/Hbase Integration from the past 2 days. I am facing the 
below issue while creating external table in Hive.


*Command-Line Error :-

*had...@s2-ratw-1:~/project/hive-0.6.0/build/dist$ bin/hive --auxpath 
/home/hadoop/project/hive-0.6.0/build/dist/lib/hive_hbase-handler.jar,/home/hadoop/project/hive-0.6.0/build/dist/lib/hbase-0.20.3.jar,/home/hadoop/project/hive-0.6.0/build/dist/lib/zookeeper-3.2.2.jar  
-hiveconf 
hbase.zookeeper.quorum=192.168.1.103,192.168.1.114,192.168.1.115,192.168.1.104,192.168.1.107
Hive history 
file=/tmp/hadoop/hive_job_log_hadoop_201101051527_1728376885.txt

hive show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: 
Communications link failure


The last packet sent successfully to the server was 0 milliseconds ago. 
The driver has not received any packets from the server.

NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications 
link failure


The last packet sent successfully to the server was 0 milliseconds ago. 
The driver has not received any packets from the server.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

hive exit;
had...@s2-ratw-1:~/project/hive-0.6.0/build/dist$

*My hive.log file says :*

2011-01-05 15:19:36,783 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.core.resources but it cannot be resolved.
2011-01-05 15:19:36,783 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.core.resources but it cannot be resolved.
2011-01-05 15:19:36,785 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.core.runtime but it cannot be resolved.
2011-01-05 15:19:36,785 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.core.runtime but it cannot be resolved.
2011-01-05 15:19:36,786 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.text but it cannot be resolved.
2011-01-05 15:19:36,786 ERROR DataNucleus.Plugin 
(Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires 
org.eclipse.text but it cannot be resolved.
2011-01-05 15:20:12,185 WARN  zookeeper.ClientCnxn 
(ClientCnxn.java:run(967)) - Exception closing session 0x0 to 
sun.nio.ch.selectionkeyi...@561279c8

java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
2011-01-05 15:20:12,188 WARN  zookeeper.ClientCnxn 
(ClientCnxn.java:cleanup(1001)) - Ignoring exception during shutdown input

java.nio.channels.ClosedChannelException
   at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)

   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
   at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2011-01-05 15:20:12,188 WARN  zookeeper.ClientCnxn 
(ClientCnxn.java:cleanup(1006)) - Ignoring exception during shutdown output

java.nio.channels.ClosedChannelException
   at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)

   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
   at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2011-01-05 15:20:12,621 WARN  zookeeper.ClientCnxn 
(ClientCnxn.java:run(967)) - Exception closing session 0x0 to 
sun.nio.ch.selectionkeyi...@799dbc3b


I overcomed from the previous issue of MasterNotRunning Exception which 
occured due to incompatibilities in hive_hbase jars.


Now I'm using Hadoop-0.20.2, Hive-0.6.0 ( Bydefault Derby metastore  ) 
and Hbase-0.20.3.


Please tell how this could be resolved.

Also I want to add one more thing that my hadoop Cluster is of 9 nodes 
and 8 nodes act as Datanodes,Tasktrackers and Regionservers.


Among these nodes is set zookeeper.quorum.property to have 5 Datanodes. 
Would this is the issue.
I don't know the number of servers needed for Zookeeper in fully 
distributed mode.



Best Regards

Adarsh Sharma

Data for Testing in Hadoop

2011-01-03 Thread Adarsh Sharma


Dear all,

Designing the architecture is very important for the Hadoop in 
Production Clusters.


We are researching to run Hadoop in Cloud in Individual Nodes and in 
Cloud Environment ( VM's ).


For this, I require some data for testing. Would anyone send me some 
links for data of different sizes ( 10Gb, 20GB, 30 Gb , 50GB ) .

I shall be grateful for this kindness.


Thanks  Regards

Adarsh Sharma

Re: Retrying connect to server

2010-12-30 Thread Adarsh Sharma


Cavus,M.,Fa. Post Direkt wrote:

I process this

./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount gutenberg 
gutenberg-output

I get this
Dıd anyone know why I get this Error?

10/12/30 16:48:59 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 0 time(s).
10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 1 time(s).
10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 2 time(s).
10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 3 time(s).
10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 4 time(s).
10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 5 time(s).
10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 6 time(s).
10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 7 time(s).
10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 8 time(s).
10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9001. Already tried 9 time(s).
Exception in thread main java.net.ConnectException: Call to 
localhost/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: 
Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
at org.apache.hadoop.ipc.Client.call(Client.java:908)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.getProtocolVersion(Unknown Source)
at 
org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94)
at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:70)
at org.apache.hadoop.mapreduce.Job.init(Job.java:129)
at org.apache.hadoop.mapreduce.Job.init(Job.java:134)
at org.postdirekt.hadoop.WordCount.main(WordCount.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:207)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1025)
at org.apache.hadoop.ipc.Client.call(Client.java:885)
... 15 more
  

This is the most common issue occured after configuring Hadoop Cluster.

Reason :

1. Your NameNode, JobTracker is not running. Verify through Web UI and 
jps commands.
2. DNS Resolution. You must have IP hostname enteries if all nodes in 
/etc/hosts file.




Best Regards

Adarsh Sharma

Re: UI doesn't work

2010-12-27 Thread Adarsh Sharma


maha wrote:

Hi,

  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm 
using Hadoop-0.20.2 and the following are parts of my configuration files.

 in Core-site.xml:
namefs.default.name/name
valuehdfs://speed.cs.ucsb.edu:9000/value

in mapred-site.xml:
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value


when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.


Any ideas?

  Thank you,
 Maha


  

Check the logs of namenode and jobtracker and post their listings.

Best Regards

Adarsh

Thrift Error

2010-12-16 Thread Adarsh Sharma


Hi all,

I am googled a lot about the below error but can't able to find the root 
cause.


I am selecting data from Hive table website_master but it results in 
below error :


Hibernate: select website_ma0_.s_no as col_0_0_ from website_master1 
website_ma0_
org.apache.thrift.TApplicationException: Invalid method name: 
'getThriftSchema'
   at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:107)
   at 
org.apache.hadoop.hive.service.ThriftHive$Client.recv_getThriftSchema(ThriftHive.java:247)
   at 
org.apache.hadoop.hive.service.ThriftHive$Client.getThriftSchema(ThriftHive.java:231)
   at 
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.initDynamicSerde(HiveQueryResultSet.java:76)
   at 
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.init(HiveQueryResultSet.java:57)
   at 
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.init(HiveQueryResultSet.java:48)
   at 
org.apache.hadoop.hive.jdbc.HivePreparedStatement.executeImmediate(HivePreparedStatement.java:194)
   at 
org.apache.hadoop.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:151)
   at 
org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:107)

   at org.hibernate.loader.Loader.getResultSet(Loader.java:1183)
   at org.hibernate.loader.hql.QueryLoader.iterate(QueryLoader.java:381)
   at 
org.hibernate.hql.ast.QueryTranslatorImpl.iterate(QueryTranslatorImpl.java:278)

   at org.hibernate.impl.SessionImpl.iterate(SessionImpl.java:865)
   at org.hibernate.impl.QueryImpl.iterate(QueryImpl.java:41)
   at SelectClauseExample.main(SelectClauseExample.java:25)
10/12/16 14:06:55 WARN jdbc.AbstractBatcher: exception clearing 
maxRows/queryTimeout

java.sql.SQLException: Method not supported
   at 
org.apache.hadoop.hive.jdbc.HivePreparedStatement.getQueryTimeout(HivePreparedStatement.java:926)
   at 
org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:185)
   at 
org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:123)

   at org.hibernate.loader.Loader.getResultSet(Loader.java:1191)
   at org.hibernate.loader.hql.QueryLoader.iterate(QueryLoader.java:381)
   at 
org.hibernate.hql.ast.QueryTranslatorImpl.iterate(QueryTranslatorImpl.java:278)

   at org.hibernate.impl.SessionImpl.iterate(SessionImpl.java:865)
   at org.hibernate.impl.QueryImpl.iterate(QueryImpl.java:41)
   at SelectClauseExample.main(SelectClauseExample.java:25)
10/12/16 14:06:55 WARN util.JDBCExceptionReporter: SQL Error: 0, 
SQLState: null
10/12/16 14:06:55 ERROR util.JDBCExceptionReporter: Could not create 
ResultSet: Invalid method name: 'getThriftSchema'

could not execute query using iterate

Can someone Please tell me why this occurs and how to resolve it.


Thanks  Regards

Adarsh Sharma

Re: Thrift Error

2010-12-16 Thread Adarsh Sharma


Viral Bajaria wrote:

Adarsh,

hive and hadoop both ship with the libthrift.jar and libfb303.jar, you
should locate the 1's shipped with hadoop and move them to some other folder
or rename them.

for me the location for this libraries were as follows
libthrift.jar : /usr/lib/hadoop/lib/
libfb303.jar : /usr/lib/hive/lib/

See if this issue solves the problem. I have faced this issue earlier when
accessing hive over a thrift server.

Thanks,
Viral

On Thu, Dec 16, 2010 at 2:12 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Hi all,

I am googled a lot about the below error but can't able to find the root
cause.

I am selecting data from Hive table website_master but it results in below
error :

Hibernate: select website_ma0_.s_no as col_0_0_ from website_master1
website_ma0_
org.apache.thrift.TApplicationException: Invalid method name:
'getThriftSchema'
  at
org.apache.thrift.TApplicationException.read(TApplicationException.java:107)
  at
org.apache.hadoop.hive.service.ThriftHive$Client.recv_getThriftSchema(ThriftHive.java:247)
  at
org.apache.hadoop.hive.service.ThriftHive$Client.getThriftSchema(ThriftHive.java:231)
  at
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.initDynamicSerde(HiveQueryResultSet.java:76)
  at
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.init(HiveQueryResultSet.java:57)
  at
org.apache.hadoop.hive.jdbc.HiveQueryResultSet.init(HiveQueryResultSet.java:48)
  at
org.apache.hadoop.hive.jdbc.HivePreparedStatement.executeImmediate(HivePreparedStatement.java:194)
  at
org.apache.hadoop.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:151)
  at
org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:107)
  at org.hibernate.loader.Loader.getResultSet(Loader.java:1183)
  at org.hibernate.loader.hql.QueryLoader.iterate(QueryLoader.java:381)
  at
org.hibernate.hql.ast.QueryTranslatorImpl.iterate(QueryTranslatorImpl.java:278)
  at org.hibernate.impl.SessionImpl.iterate(SessionImpl.java:865)
  at org.hibernate.impl.QueryImpl.iterate(QueryImpl.java:41)
  at SelectClauseExample.main(SelectClauseExample.java:25)
10/12/16 14:06:55 WARN jdbc.AbstractBatcher: exception clearing
maxRows/queryTimeout
java.sql.SQLException: Method not supported
  at
org.apache.hadoop.hive.jdbc.HivePreparedStatement.getQueryTimeout(HivePreparedStatement.java:926)
  at
org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:185)
  at
org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:123)
  at org.hibernate.loader.Loader.getResultSet(Loader.java:1191)
  at org.hibernate.loader.hql.QueryLoader.iterate(QueryLoader.java:381)
  at
org.hibernate.hql.ast.QueryTranslatorImpl.iterate(QueryTranslatorImpl.java:278)
  at org.hibernate.impl.SessionImpl.iterate(SessionImpl.java:865)
  at org.hibernate.impl.QueryImpl.iterate(QueryImpl.java:41)
  at SelectClauseExample.main(SelectClauseExample.java:25)
10/12/16 14:06:55 WARN util.JDBCExceptionReporter: SQL Error: 0, SQLState:
null
10/12/16 14:06:55 ERROR util.JDBCExceptionReporter: Could not create
ResultSet: Invalid method name: 'getThriftSchema'
could not execute query using iterate

Can someone Please tell me why this occurs and how to resolve it.


Thanks  Regards

Adarsh Sharma




  

Thanks a Lot Viral !

-Adarsh

Re: Hadoop upgrade [Do we need to have same value for dfs.name.dir ] while upgrading

2010-12-15 Thread Adarsh Sharma


sandeep wrote:
 

 


HI ,

 


I am trying to upgrade hadoop ,as part of this i have set Two environment
variables NEW_HADOOP_INSTALL and OLD_HADOOP_INSTALL .

 


After this i have executed the following command %
NEW_HADOOP_INSTALL/bin/start-dfs -upgrade

 


But namenode didnot started as it was throwing Inconsistent state exception
as the dfs.name.dir is not present

 


Here My question is while upgrading do we need to have the same old
configurations like dfs.name.dir..etc

 


Or Do i need to format that namenode first and then start upgrading?

 


Please let me know

 


Thanks

sandeep

 



***
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

 



  
Sandeep This Error occurs due to new namespace issue in Hadoop. Did u 
copy dfs.name.dir and fs.checkpoint dir to new Hadoop  directory.


Namenode Format would cause u to loose all previous data.


Best Regards

Adarsh Sharma

Re: How to Speed Up Decommissioning progress of a datanode.

2010-12-15 Thread Adarsh Sharma


sravankumar wrote:

Hi,

 


Does any one know how to speed up datanode decommissioning and
what are all the configurations

related to the decommissioning.

How to Speed Up Data Transfer from the Datanode getting
decommissioned.

 


Thanks  Regards,

Sravan kumar.


Check the attachment

--Adarsh

Balancing Data among Datanodes : HDFS will not move blocks to new nodes 
automatically. However, newly created files will likely have their blocks 
placed on the new nodes. 


There are several ways to rebalance the cluster manually. 


-Select a subset of files that take up a good percentage of your disk space; 
copy them to new locations in HDFS; remove the old copies of the files; rename 
the new copies to their original names. 

-A simpler way, with no interruption of service, is to turn up the replication 
of files, wait for transfers to stabilize, and then turn the replication back 
down. 
-Yet another way to re-balance blocks is to turn off the data-node, which is 
full, wait until its blocks are replicated, and then bring it back again. The 
over-replicated blocks will be randomly removed from different nodes, so you 
really get them rebalanced not just removed from the current node. 

-Finally, you can use the bin/start-balancer.sh command to run a balancing 
process to move blocks around the cluster automatically. 


bash-3.2$ bin/start-balancer.sh 
or

$ bin/hadoop balancer -threshold 10

starting balancer, logging to 
/home/hadoop/project/hadoop-0.20.2/bin/../logs/hadoop-hadoop-balancer-ws-test.out
 

Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved 

The cluster is balanced. Exiting... 
Balancing took 350.0 milliseconds 

A cluster is balanced iff there is no under-capactiy or over-capacity data 
nodes in the cluster.
An under-capacity data node is a node that its %used space is less than 
avg_%used_space-threshhold.
An over-capacity data node is a node that its %used space is greater than 
avg_%used_space+threshhold. 
A threshold is user configurable. A default value could be 20% of % used space.

Libfb303.jar

2010-12-14 Thread Adarsh Sharma


Dear all,

I am using Hadoop-0.20.2 and Hadoopdb Hive on a 5 node cluster.
I am connecting Hive through Eclipse but I got the error below :

Hive history 
file=/tmp/hadoop/hive_job_log_hadoop_201012141618_1092196256.txt
10/12/14 16:18:37 INFO exec.HiveHistory: Hive history 
file=/tmp/hadoop/hive_job_log_hadoop_201012141618_1092196256.txt
Exception in thread pool-1-thread-1 java.lang.NoSuchMethodError: 
com.facebook.fb303.FacebookService$Processor$ProcessFunction.process(ILcom/facebook/thrift/protocol/TProtocol;Lcom/facebook/thrift/protocol/TProtocol;)V
   at 
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:249)
   at 
com.facebook.thrift.server.TThreadPoolServer$WorkerProcess.run(Unknown 
Source)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)

I  know this errrooccurs due to libfb3003.jar bothin Hadoop and Hive 
lib. Can someone Please tell how to resolve this errror.


Thanks  Regards

Adarsh Sharma

Re: exceptions copying files into HDFS

2010-12-13 Thread Adarsh Sharma

 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:50010, 
storageID=DS-1618752214-127.0.0.2-50010-1292091159510, infoPort=50075, 
ipcPort=50020)In DataNode.run, data = 
FSDataset{dirpath='/tmp/hadoop-rock/dfs/data/current'}
2010-12-11 21:02:47,816 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2010-12-11 21:02:47,818 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: using 
BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec
2010-12-11 21:02:47,819 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 50020: starting
2010-12-11 21:02:47,819 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 1 on 50020: starting
2010-12-11 21:02:47,819 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 2 on 50020: starting
2010-12-11 21:02:47,819 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 50020: starting
2010-12-11 21:02:47,827 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 
blocks got processed in 6 msecs
2010-12-11 21:02:47,827 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic 
block scanner.
2010-12-11 21:04:41,371 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:50010, 
storageID=DS-1618752214-127.0.0.2-50010-1292091159510, infoPort=50075, 
ipcPort=50020):DataXceiver

java.net.SocketException: Operation not supported
at sun.nio.ch.Net.getIntOption0(Native Method)
at sun.nio.ch.Net.getIntOption(Net.java:181)
at 
sun.nio.ch.SocketChannelImpl$1.getInt(SocketChannelImpl.java:419)

at sun.nio.ch.SocketOptsImpl.getInt(SocketOptsImpl.java:60)
at 
sun.nio.ch.SocketOptsImpl.receiveBufferSize(SocketOptsImpl.java:142)
at 
sun.nio.ch.SocketOptsImpl$IP$TCP.receiveBufferSize(SocketOptsImpl.java:286) 

at 
sun.nio.ch.OptionAdaptor.getReceiveBufferSize(OptionAdaptor.java:148)
at 
sun.nio.ch.SocketAdaptor.getReceiveBufferSize(SocketAdaptor.java:336)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:255) 

at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) 



=== CONFIG FILES ===

r...@ritter:~/programs/hadoop-0.20.2+737/conf cat core-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!-- Put site-specific property overrides in this file. --
configuration
property
namefs.default.name/name
valuehdfs://localhost/value
!-- default port 8020 --
/property
/configuration


r...@ritter:~/programs/hadoop-0.20.2+737/conf cat hdfs-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!-- Put site-specific property overrides in this file. --
configuration
property
namedfs.replication/name
value1/value
/property
/configuration


Simply Check through ssh that your slaves are connecting to each other.

ssh from 1 slave to another.

Best Regards

Adarsh Sharma

Re: Reduce Error

2010-12-09 Thread Adarsh Sharma


Ted Yu wrote:

From Raj earlier:

I have seen this error from time to time and it has been either due to space
or
missing directories  or disk errors.

Space issue was caused by the fact that the I had mounted /de/sdc on
/hadoop-dsk
and the mount had failed. And in another case I had

accidentally deleted hadoop.tmp.dir  in a node and whenever  the reduce job
was
scheduled on that node that attempt would fail.

On Wed, Dec 8, 2010 at 8:21 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Raj V wrote:



Go through the jobtracker, find the relevant node that handled
attempt_201012061426_0001_m_000292_0 and figure out
if there are FS or permssion problems.

Raj



From: Adarsh Sharma adarsh.sha...@orkash.com
To: common-user@hadoop.apache.org
Sent: Wed, December 8, 2010 7:48:47 PM
Subject: Re: Reduce Error


Ted Yu wrote:


  

Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?

On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.com


wrote:
  




Dear all,

Did anyone encounter the below error while running job in Hadoop. It
occurs
in the reduce phase of the job.

attempt_201012061426_0001_m_000292_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
any
valid local directory for

taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out
t

It states that it is not able to locate a file that is created in
 mapred.local.dir of Hadoop.

Thanks in Advance for any sort of information regarding this.

Best Regards

Adarsh Sharma



  


Hi Ted,

My mapred.local.dir is in /home/hadoop directory. I also check it with in
/hdd2-2 directory where  we have lots of space.

Would mapred.map.tasks affects.

I checked with default and also with 80 maps and 16 reduces as I have 8
slaves.


property
namemapred.local.dir/name
value/home/hadoop/mapred/local/value
descriptionThe local directory where MapReduce stores intermediate
data files.  May be a comma-separated list of directories on different
devices in order to spread disk i/o.
Directories that do not exist are ignored.
/description
/property

property
namemapred.system.dir/name
value/home/hadoop/mapred/system/value
descriptionThe shared directory where MapReduce stores control files.
/description
/property

Any further information u want.


Thanks  Regards

Adarsh Sharma


  

Sir I read the tasktracker logs several times but not able to find any
reason as they are not very useful. I attached with the mail of tasktracker.
However I listed main portion.
2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201012061426_0001_m_00_1' to tip
task_201012061426_0001_m_00, for tracker 'tracker_ws37-user-lin:
127.0.0.1/127.0.0.1:60583'
2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201012061426_0001_m_00
2010-12-06 15:27:04,229 INFO org.apache.hadoop.mapred.JobTracker: Removed
completed task 'attempt_201012061426_0001_m_00_0' from
'tracker_ws37-user-lin:127.0.0.1/127.0.0.1:60583'
2010-12-06 15:27:07,235 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201012061426_0001_m_000328_0: java.io.IOException: Spill failed
  at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:860)
  at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
  at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
  at
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:30)
  at
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:19)
  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any valid local directory for
taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000328_0/output/spill16.out
  at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
  at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
  at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
  at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
  at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
  at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

2010-12-06 15:27:07,236 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201012061426_0001_m_00_1: Error initializing
attempt_201012061426_0001_m_00_1:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local

Hadoop on Cloud or Not

2010-12-09 Thread Adarsh Sharma


Hello,

I have Eucalyptus 1.6.2 installed on ubuntu 10.04 using source 
installation with kvm. Currently I have ten nodes in my cloud in a 
single cluster architecture.

Also I have tested Hadoop on VM's and run several  jobs

I am trying to run Hadoop in a cloud environment. So I will launch 
hadoop instances on the cloud. Now there is huge data on each Hadoop 
node so I am planning to use volumes as of now to store that data of 
each instance i.e Hadoop node. But since volumes are stored at Storage 
controllers so this means that there is continuous movement of data 
(lots of GBs) in cloud network from SC to node and also the response 
time of work done on Hadoop instances will be slow due to time taken by 
data to travel in the network.


So, now is it possible to store volumes (or any other way) on the nodes 
so that above problem can be resolved.


Second case : I can store data on the hard disk attached to the nodes 
and Hadoop instances can access that data easily but for that I would be 
required to start the instances on the node where data has been stored. 
So for this can I by using any hack or by anything decide the node for a 
instance to be started.


Can anyone who has some working experience with Hadoop on cloud 
environment give me any pointers?

I will really appreciate any sort of support on this.

Finally is it worthful to do this as I previously recieve some response 
like this :



Is it possible to run Hadoop in VMs on Production Clusters so that we
have 1s of nodes on 100s of servers to achieve high performance
through Cloud Computing.


you don't achieve performance that way. You are better off with 1VM per 
physical host, and you will need to talk to a persistent filestore for 
the data you want to retain. Running 1 VM per physical host just 
creates conflict for things like disk, ether and CPU that the virtual OS 
won't be aware of. Also, VM to disk performance is pretty bad right now, 
though that's improving.



Thanks  Regards

Adarsh Sharma

Re: Reduce Error

2010-12-08 Thread Adarsh Sharma

Ted Yu wrote:

Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?

On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

Dear all,

Did anyone encounter the below error while running job in Hadoop. It occurs
in the reduce phase of the job.

attempt_201012061426_0001_m_000292_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for
taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out

It states that it is not able to locate a file that is created in
mapred.local.dir of Hadoop.

Thanks in Advance for any sort of information regarding this.

Best Regards

Adarsh Sharma

Hi Ted,

My mapred.local.dir is in /home/hadoop directory. I also check it with
in /hdd2-2 directory where we have lots of space.

Would mapred.map.tasks affects.

I checked with default and also with 80 maps and 16 reduces as I have 8
slaves.

property
namemapred.local.dir/name
value/home/hadoop/mapred/local/value
descriptionThe local directory where MapReduce stores intermediate
data files. May be a comma-separated list of directories on different
devices in order to spread disk i/o.

Directories that do not exist are ignored.
/description
/property

property
namemapred.system.dir/name
value/home/hadoop/mapred/system/value
descriptionThe shared directory where MapReduce stores control files.
/description
/property

Any further information u want.

Thanks Regards

Adarsh Sharma

Re: Running not as hadoop user

2010-12-08 Thread Adarsh Sharma


Todd Lipcon wrote:

The user who started the NN has superuser privileges on HDFS.

You can also configure a supergroup by setting
dfs.permissions.supergroup (default supergroup)

-Todd

On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote:
  

Hi,

hadoop user has some advantages for running Hadoop. For example, if HDFS
is mounted as a local file system, then only user hadoop has write/delete
permissions.

Can this privilege be given to another user? In other words, is this
hadoop user hard-coded, or can another be used in its stead?

Thank you,
Mark






  
You may also set dfs.permissions = false and grant seperate groups to 
access HDFS through properties in hdfs.sites.xml


--Adarsh

1 2 >

1 - 100 of 119 matches

Mail list logo