AvatarDataNode Error

2012-02-14 Thread bourne1900
Hi,all.
When start a AvatarDataNode,show error below:
2012-02-14 17:33:50,719 ERROR 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode: 
java.lang.IllegalArgumentException: not a proxy instance
at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:393)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:603)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.shutdown(AvatarDataNode.java:576)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:218)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.init(AvatarDataNode.java:119)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.makeInstance(AvatarDataNode.java:691)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.instantiateDataNode(AvatarDataNode.java:715)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.createDataNode(AvatarDataNode.java:720)
at 
org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.main(AvatarDataNode.java:728)
Does anybody know the reason?
Thank you,
Bourne

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-14 Thread Bing Li
Dear Jimmy,

After changing my Linux from RedHat 9 to Ubuntu 10, all the problems are
solved.

Maybe RedHat 9 is too old to fit HBase and Hadoop?

Thanks,
Bing

On Tue, Feb 14, 2012 at 5:23 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I noticed that my name node was not started. That might be the reason? I
 still tried to figure out why the name node was not started.

 Thanks so much!

 Bing


 On Tue, Feb 14, 2012 at 5:11 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 Which HDFS/Hadoop are you using?  The name node configuration for
 fs.default.name
 should be hdfs://localhost:9000 if you want your hbase.rootdir to be
 hdfs://localhost:9000/
 hbase.  They need to match.




 On Mon, Feb 13, 2012 at 11:58 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 Thanks so much for your instant reply!

 My hbase-site.xml is like the following.

   property
 namehbase.rootdir/name
 valuehdfs://localhost:9000/hbase/value
   /property
   property
 namedfs.replication/name
 value1/value
   /property
   property
 namehbase.master/name
 valuelocalhost:6/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
   property
 namehbase.zookeeper.quorum/name
 valuelocalhost/value
   /property

 When I run hadoop fs -ls /, the directories and files under the linux
 root are displayed.

 Best,
 Bing

 On Tue, Feb 14, 2012 at 3:48 AM, Jimmy Xiang jxi...@cloudera.comwrote:

 Which port does your HDFS listen to? It is not 9000, right?

 namehbase.rootdir/name
 valuehdfs://localhost:9000/hbase/value

 You need to fix this and make sure your HDFS is working, for example,
 the following command should work for you.

 hadoop fs -ls /



 On Mon, Feb 13, 2012 at 11:44 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I configured the standalone mode successfully. But I wonder why the
 pseudo distributed one does work.

 I checked in logs and got the following exceptions. Does the
 information give you some hints?

 Thanks so much for your help again!

 Best,
 Bing

 2012-02-13 18:25:49,782 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
 connection exception: java.net.ConnectException: Connection refuse
 d
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
  at org.apache.hadoop.ipc.Client.call(Client.java:1071)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
  at $Proxy10.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
 at
 org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203)
  at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
 at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
  at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:471)
 at
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:94)
  at
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
  at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
 at
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
  at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
 at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
 at org.apache.hadoop.ipc.Client.call(Client.java:1046)
  ... 18 more
 2012-02-13 18:25:49,787 INFO org.apache.hadoop.hbase.master.HMaster:
 Aborting
 2012-02-13 18:25:49,787 DEBUG org.apache.hadoop.hbase.master.HMaster:
 Stopping service threads


 Thanks so much!
 Bing


 On Tue, Feb 14, 2012 at 3:35 AM, Jimmy Xiang jxi...@cloudera.comwrote:

 In this case, you may just use the standalone mode.  You can follow
 the quick start step by step.

 The default zookeeper port is 2181, you don't need to configure it.



 On Mon, Feb 13, 2012 at 11:28 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I am a new user of HBase. My experiences in HBase and Hadoop is very
 limited. I just tried to follow some books, such as 

Re: Hadoop scripting when to use dfs -put

2012-02-14 Thread Harsh J
For the sake of http://xkcd.com/979/, and since this was cross posted,
Håvard managed to solve this specific issue via Joey's response at
https://groups.google.com/a/cloudera.org/group/cdh-user/msg/c55760868efa32e2

2012/2/14 Håvard Wahl Kongsgård haavard.kongsga...@gmail.com:
 My environment heap size varies from 18GB to 2GB
 in mapred-site.xml mapred.child.java.opts = -Xmx512M

 System Ubuntu 10.04 LTS, java-6-sun-1.6.0.26, ,latest cloudera version of 
 hadoop


 This log from the tasklog
 Original exception was:
 java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
        at 
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:212)
        at 
 org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
        at 
 org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
        at 
 org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:418)


 I don't have a recursive loop like while or something else

 my dumbo code

 multi_tree() is just a simple function

 where the error handling is
 try:
 except:
 pass

 def mapper(key, value):
   v = value.split( )[0]
   yield multi_tree(v),1


 if __name__ == __main__:
   import dumbo
   dumbo.run(mapper)


 -Håvard


 On Mon, Feb 13, 2012 at 8:52 PM, Rohit ro...@hortonworks.com wrote:
 Hi,

 What threw the heap error? Was it the Java VM, or the shell environment?

 It would be good to look at free RAM memory on your system before and after 
 you ran the script as well, to see if your system is running low on memory.

 Are you using a recursive loop in your script?

 Thanks,
 Rohit


 Rohit Bakhshi





 www.hortonworks.com (http://www.hortonworks.com/)





 On Monday, February 13, 2012 at 10:39 AM, Håvard Wahl Kongsgård wrote:

 Hi, I originally posted this on the dumbo forum, but it's more a
 general scripting hadoop issue.

 When testing a simple script that created some local files
 and then copied them to hdfs
 with os.system(hadoop dfs -put /home/havard/bio_sci/file.json
 /tmp/bio_sci/file.json)

 the tasks fail with out of heap memory. The files are tiny, and I have
 tried increasing the
 heap size. When skipping the hadoop dfs -put, the tasks do not fail.

 Is it wrong to use hadoop dfs -put inside running a script with
 hadoop? Should I instead
 transfer the files at the end with a combiner, or simply mount hdfs
 locally and write directly to hdfs? Any general suggestions?


 --
 Håvard Wahl Kongsgård
 NTNU

 http://havard.security-review.net/




 --
 Håvard Wahl Kongsgård
 NTNU

 http://havard.security-review.net/



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


iterator method in Configuration class doesn't interpret expression in a property

2012-02-14 Thread Kousuke Saruta
Hi all,

I use Hadoop-1.0 or Hadoop-0.20.2 and Pig 0.8.
I encountered a strange behavior of Configuration class.

Configuration#get method can interpret expressions in a value related
to a property like ${foo.bar} but Configuration#iterator method
doesn't interpret expressions although the method returns the Iterator
which contains some pairs of  a property name and  a value as
Map.EntryString , String objects.

Do you know that the behavior of Configuration class is just a
specification or a defect?

Because of this matter, I can't use expressions for properties used in
pig internal e.g. pig.logfile, pig.temp.dir and so on even if I set
those properties in *-site.xml.

Best Regards,
Kousuke


iterator method in Configuration class doesn't interpret expression in a property

2012-02-14 Thread Kousuke Saruta
Hi all,

I use Hadoop-1.0 or Hadoop-0.20.2.
I encountered a strange behavior of Configuration class.

Configuration#get method can interpret expressions in a value related
to a property like ${foo.bar} but Configuration#iterator method
doesn't interpret expressions although the method returns the Iterator
which contains some pairs of  a property name and  a value as
Map.EntryString , String objects.

Do you know that the behavior of Configuration class is just a
specification or a defect?

Because of this matter, I can't use expressions for properties used in
pig internal e.g. pig.logfile, pig.temp.dir and so on even if I set
those properties in *-site.xml.

Best Regards,
Kousuke


Re: Processing small xml files

2012-02-14 Thread W.P. McNeill
I'm not sure what you mean by flat format here.

In my scenario, I have an file input.xml that looks like this.

myfile
   section
  value1/value
   /section
   section
  value2/value
   /section
/myfile

input.xml is a plain text file. Not a sequence file. If I read it with the
XMLInputFormat my mapper gets called with (key, value) pairs that look like
this:

(, sectionvalue1/value/section)
(, sectionvalue2/value/section)

Where the keys are numerical offsets into the file. I then use this
information to write a sequence file with these (key, value) pairs. So my
Hadoop job that uses XMLInputFormat takes a text file as input and produces
a sequence file as output.

I don't know a rule of thumb for how many small files is too many. Maybe
someone else on the list can chime in. I just know that when your
throughput gets slow that's one possible cause to investigate.


Re: Processing small xml files

2012-02-14 Thread Rohit
Hi Mohit,

 How many are too many for namenode? We have around 100M files and 100M
 files every year
  
  
  

The name-node stores file and block metadata in RAM.  

This is an estimate at memory utilization per file and block:
Estimates show that the name-node uses fewer than 200 bytes to store a single 
metadata object (a file inode or a block). According to statistics on our 
clusters, a file on average consists of 1.5 blocks, which means that it takes 
600 bytes (1 file object + 2 block objects) to store an average file in 
name-node’s RAM

http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf

Next generation Hadoop (Hadoop 0.23) brings HDFS Federation, which will improve 
scalability of the name-node. You can read more about that here:
http://hortonworks.com/an-introduction-to-hdfs-federation/


Rohit Bakhshi





www.hortonworks.com (http://www.hortonworks.com/)






On Tuesday, February 14, 2012 at 10:56 AM, W.P. McNeill wrote:

 I'm not sure what you mean by flat format here.
  
 In my scenario, I have an file input.xml that looks like this.
  
 myfile
 section
 value1/value
 /section
 section
 value2/value
 /section
 /myfile
  
 input.xml is a plain text file. Not a sequence file. If I read it with the
 XMLInputFormat my mapper gets called with (key, value) pairs that look like
 this:
  
 (, sectionvalue1/value/section)
 (, sectionvalue2/value/section)
  
 Where the keys are numerical offsets into the file. I then use this
 information to write a sequence file with these (key, value) pairs. So my
 Hadoop job that uses XMLInputFormat takes a text file as input and produces
 a sequence file as output.
  
 I don't know a rule of thumb for how many small files is too many. Maybe
 someone else on the list can chime in. I just know that when your
 throughput gets slow that's one possible cause to investigate.
  
  




Secondarynamenode fail to do checkpoint

2012-02-14 Thread johnson
When I tried to run the secondaynamenode in a separate machine,It throws the
following exception

2012-02-10 00:25:46,551 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable
Exception in doCheckpoint:
2012-02-10 00:25:46,551 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1091)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1006)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1008)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:672)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$500(SecondaryNameNode.java:571)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:448)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:412)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:275)
at java.lang.Thread.run(Thread.java:722)

Can anyone give me some advice?

--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Secondarynamenode-fail-to-do-checkpoint-tp3745662p3745662.html
Sent from the Users mailing list archive at Nabble.com.


Secondarynamenode fail to do checkpoint

2012-02-14 Thread johnson
I hava tried to run the secondarynamenode in a separate machine for a
while,and the dfs.checkpoint.period is set to 15 minutes, It works well if I
don't put any data to the HDFS,but if I put data to the HDFS , the
secondaynamenode is down, and throws the following exception

2012-02-10 00:25:46,551 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable
Exception in doCheckpoint:
2012-02-10 00:25:46,551 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1091)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1006)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1008)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:672)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$500(SecondaryNameNode.java:571)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:448)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:412)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:275)
at java.lang.Thread.run(Thread.java:722)

Can anyone give me some help?


--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Secondarynamenode-fail-to-do-checkpoint-tp3745671p3745671.html
Sent from the Users mailing list archive at Nabble.com.


when hadoop report job finished?

2012-02-14 Thread Jinyan Xu

Hi all,

I run terasort test on pseudo distributed, I  found the one reduce job finished 
time is shorter than the four reduces job. I used tool to search why.
Use nmon to monitor the disk read/write operation, I found a phenomena, when 
one reduce job report finished but the disk write data operation is still 
sustained for a long time. For four reduce jobs, when job report finished the 
disk read/write operation is done.

In a nutshell, one reduce job finished time add disk write operation time equal 
to four reduces job finished time.

So I think this is the reason why one reduce job time is shorter than the four 
reduces job, finally, I want to ask why that happened?  What is the reduce job 
finished mark?


The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.


Extending pipes to support binary data

2012-02-14 Thread Charles Earl
Hi,
I'm trying to extend the pipes interface as defined in Pipes.hh to
support the read of binary input data.
I believe that would mean extending the getInputValue() method of
context to return char *, which would then be memcpy'd to appropriate
type inside the C++ pipes program.
I'm guessing the best way  to do this would be to use a custom
InputFormat on the java side that would have BytesWritable value.
Is this the correct approach?

-- 
- Charles


Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-14 Thread mete
Hello Varun,
i have patched and recompiled ganglia from source bit it still cores after
the patch.

Here are some logs:
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd):
/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd:
converting '4.9E-324' to float: Numerical result out of range
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd):
/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd:
converting '4.9E-324' to float: Numerical result out of range
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd):
/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd:
converting '4.9E-324' to float: Numerical result out of range
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd:
converting '4.9E-324' to float: Numerical result out of range
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd):
/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd:
converting '4.9E-324' to float: Numerical result out of range
Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***:
gmetad terminated

i am using hadoop.1.0.0 and ganglia 3.20 tarball.

Cheers
Mete

On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek masmer...@gmail.com wrote:

 Varun unfortunately I have had some problems with deploying a new version
 on the cluster.. Hadoop is not picking the new build in lib folder despite
 a classpath is set to it. The new build is picked just if I put it in the
 $HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes
 and can access the web, but all tasktracker are being stopped because of an
 error:

 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
  java.lang.InterruptedException: sleep interrupted
  at java.lang.Thread.sleep(Native Method)
  at
 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
 


 Probably the error is the consequence of an inadequate deploy of a jar.. I
 will ask to the dev list how they do it or are you maybe having any other
 idea?



 On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com wrote:

  Hey Merto,
 
  Any luck getting the patch running on your cluster?
 
  In case you're interested, there's now a JIRA for this:
  https://issues.apache.org/jira/browse/HADOOP-8052.
 
  Varun
 
  On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez...@hortonworks.com
  wrote:
 
   Your general procedure sounds correct (i.e. dropping your newly built
  .jar
   into $HD_HOME/lib/), but to make sure it's getting picked up, you
 should
   explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH
  environment
   variable; here's mine, as an example:
  
   export HADOOP_CLASSPATH=.:./build/*.jar
  
   About your second point, you certainly need to copy this newly patched
   .jar to every node in your cluster, because my patch changes the value
  of a
   couple metrics emitted TO gmetad (FROM all the nodes in the cluster),
 so
   without copying it over to every node in the cluster, gmetad will still
   likely receive some bad metrics.
  
   Varun
  
  
   On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek masmer...@gmail.com
  wrote:
  
   I will need your help. Please confirm if the following procedure is
  right.
   I have a dev environment where I pimp my scheduler (no hadoop running)
  and
   a small cluster environment where the changes(jars) are deployed with
  some
   scripts,  however I have never compiled the whole hadoop from source
 so
  I
   do not know if I am doing it right. I' ve done it as follow:
  
   a) apply a patch
   b) cd $HD_HOME; ant
   c) copy $HD_HOME/*build*/patched-core-hadoop.jar -
   cluster:/$HD_HOME/*lib*
   d) run $HD_HOME/bin/start-all.sh
  
   Is this enough? When I tried to test hadoop dfs -ls / I could see
  that a
   new jar was not loaded and instead a jar from
   $HD_HOME/*share*/hadoop-20.205.0.jar
   was taken..
   Should I copy the entire hadoop folder to all nodes and reconfigure
 the
   entire cluster for the new build, or is enough if I configure it just
 on
   the node where gmetad will run?
  
  
  
  
  
  
   On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com
 wrote:
  
I'm so sorry, Merto - like a silly goose, I attached the 2 patches
 to
  my
reply, and of course the mailing list did not accept the attachment.
   
I plan on opening JIRAs for this tomorrow, but till then, here are