Success hadoop/pig jobs didn't remove _temporary folder
Dear all, We have met this issue randomly. Some of our jobs will occasionally keep the _temporary folder in the output still there. And it will lead the job depends on the output directory failed since the _temporary directory could not be processed correctly by the input format. We are using hadoop 0.20.3 with some patches and pig 0.8.1. I am wondering if some one have met the same issue? Best wishes, Stanley Xu
Is there anyway I could use ClusterMapReduceTestCase with the 0.20 new API?
Dear All, I am trying to write a test case for my mapreduce job, which use the new api in 0.20(using the mapreduce packege rather than the mapred package). For the mapper used the distributed cache, so I could not use the mrunit to do the test. I thought I could use the ClusterMapReduceTestCase to set up a mini cluster for testing. But it looks that if I just call the job.waitForCompletion, it tried to find the input path from the local file system rather than the hdfs created by the mini cluster. I am wondering if there is anything I could do to run the hadoop 0.20 new api on the cluster created by the ClusterMapReduceTestCase? Thanks in advance. Best wishes, Stanley Xu
How does the region server know if a block is moved from one datanode to another?
Dear all, We were tracing a issue we have with our hbase cluster. We are almost sure it is a network issue since the problem seems disappeared after we disabled the ip_forward on all the machines and configured the route to the same configuration. But we didn't really know how these configuration might impact the cluster. The problem we have met could be found by the following link: http://search-hadoop.com/m/ZpgJ623GoyU1/.META.+inconsistencysubj=The+META+data+inconsistency+issue (The title is not proper for the issue in fact.) And by tracing the logs from region server, data node and name node, I also found something with doubt after we thought the issue is fixed and before the issue appeared. In a region server, I could still find some logs that the RegionServer tried to get a block from a data node, which is no longer served by the data node. I see the following log in region server for block 5056551999889621449 http://pastebin.com/epEt37JK And following log in the data node the region server try to get the block. http://pastebin.com/pnif75rX And following log in the name node which let the data node to delete the block. http://pastebin.com/rQ4QjUcS And if I use fsck to check the file on hdfs, it has 4 replications, which also contains the data node that should have deleted the block. http://pastebin.com/2DecD9GD But if I check the data node's local file system, I could see that the block no longer exist in the local fs. But after 6-7 hours, when I re-run fsck, the data node which should delete the block no longer exist. http://pastebin.com/014h3qNE I am wondering if is it a correct behavior for hadoop and hbase? I am using hadoop branch-0.20-append and hbase 0.20.6 I am wondering except reading all the code, if there is a document or tutorial describe how the hadoop and hbase get the data synchronized in a more detail level comparing to hbase book or official document? Best wishes, Stanley Xu
Re: c++ program
http://wiki.apache.org/hadoop/C%2B%2BWordCount The famous word count example in cxx could be found on hadoop wiki. Best wishes, Stanley Xu On Tue, Mar 15, 2011 at 7:09 PM, Manish Yadav manish.ya...@orkash.comwrote: hi can any one tell me how to run a simple hello world program written in c ++ on hadoop. i know that hadoop is for map reduce ,hadoop uses pipes for c++. but just for experimentation purpose can anybody tell me how to do this . can any one tell me how to run simple Hello worl program in c or c++?
Re: setJarByClass question
The jar in the command line might only be the jar to submit the map-reduce job, rather than the jar contains the Mapper and Reducer which will be transferred to different node. What the hadoop jar your-jar really did, is setting the classpath and other related environment, and run the main method in your-jar. You might have a different map-reduce-jar in the classpath which contains the real mapper and reducer used to do the job. Best wishes, Stanley Xu On Fri, Feb 25, 2011 at 7:23 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, this call, job.setJarByClass tells Hadoop which jar to use. But we also tell Hadoop which jar to use on the command line, hadoop jar your-jar parameters Why do we need this in both places? Thank you, Mark
Re: JobConf.setQueueName(xxx) with the new api using hadoop 0.20.2
set the mapreduce.queue.name in the configuaration object the job use 在 2011-2-22 下午11:42,Marc Sturlese marc.sturl...@gmail.com写道: I'm trying to use the fair scheduler. I have jobs written using the new api and hadoop 0.20.2. I've seen that to associate a job with a queue you have to do: JobConf.setQueueName() The Job class of the new api has not this class. How can I do that? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/JobConf-setQueueName-xxx-with-the-new-api-using-hadoop-0-20-2-tp2553042p2553042.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
How could I let my map-reduce job use the log4j.properties configuration in the jar file contains the map-reduce classes?
Dear Buddies, I am running a map reduce job in a jar file through the shell like the following: #! /bin/sh export HADOOP_CLASSPATH=/home/xuwh/log-fetcher.jar export CLASSPATH=/home/xuwh/:$CLASSPATH /opt/hadoop/bin/hadoop com.companyname.context.processor.log.preprocessor.LogCleaner In the LogCleaner class, besides submitting the map-reduce job, I will wait for the job completion and send the result to a server for further processing. I added some logs through log4j in the log uploading part and I wanted to receive the error logs through an SMTP appender, and I created my own log4j.properties file in the jar contains the LogCleaner but it didn't work. I don't want to change the log configuration in the hadoop for I have to change the configuration in all nodes. And different map-reduce jar might have different configuration in log4j. Is there any way I could make the log4j code in jar file using the log4j.properties inside the jar? Not the code in the map-reduce job, but the code to setup the job and the code to process after the job is completed. Thanks. Best wishes, Stanley Xu