Re: Ynt: Re: Cannot access Jobtracker and namenode
It's normal that they are all empty. Look at files with .log extension. 12 Nisan 2009 Pazar 23:30 tarihinde halilibrahimcakir halilibrahimca...@mynet.com yazdı: I followed these steps: $ bin/stop-all.sh $ rm -ri /tmp/hadoop-root $ bin/hadoop namenode -format $ bin/start-all.sh and looked localhost:50070 and localhost:50030 in my browser that the result was not different. Again Error 404. I looked these files: $ gedit hadoop-0.19.0/logs/hadoop-root-namenode-debian.out1 $ gedit hadoop-0.19.0/logs/hadoop-root-namenode-debian.out2 $ gedit hadoop-0.19.0/logs/hadoop-root-namenode-debian.out3 $ gedit hadoop-0.19.0/logs/hadoop-root-namenode-debian.out4 4th file is the last one related to namenode logs in the logs directory. All of them are empty. I don't understand what is wrong. - Özgün İleti - Kimden : core-user@hadoop.apache.org Kime : core-user@hadoop.apache.org Gönderme tarihi : 12/04/2009 22:56 Konu : Re: Cannot access Jobtracker and namenode Try looking at namenode logs (under logs directory). There should be an exception. Paste it here if you don't understand what it means. 12 Nisan 2009 Pazar 22:22 tarihinde halilibrahimcakir lt;halilibrahimca...@mynet.comgt; yazdı: gt; I typed: gt; gt; $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa gt; $ cat ~/.ssh/id_dsa.pub gt;gt; ~/.ssh/authorized_keys gt; gt; Deleted this directory: gt; gt; $ rm -ri /tmp/hadoop-root gt; gt; Formatted namenode again: gt; gt; $ /bin/hadoop namenode -format gt; gt; Stopped: gt; gt; $ /bin/stop-all.sh gt; gt; gt; then typed: gt; gt; gt; gt; $ ssh localhost gt; gt; so it didn't want me to enter a password. I started: gt; gt; $ /bin/start-all.sh gt; gt; But nothing changed :( gt; gt; - Özgün İleti - gt; Kimden : core-user@hadoop.apache.org gt; Kime : core-user@hadoop.apache.org gt; Gönderme tarihi : 12/04/2009 21:33 gt; Konu : Re: Ynt: Re: Cannot access Jobtracker and namenode gt; There are two commands in hadoop quick start, used for passwordless ssh. gt; Try those. gt; gt; $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa gt; $ cat ~/.ssh/id_dsa.pub gt;gt; ~/.ssh/authorized_keys gt; gt; http://hadoop.apache.org/core/docs/current/quickstart.html gt; gt; -- gt; M. Raşit ÖZDAŞ gt; gt; Halil İbrahim ÇAKIR gt; gt; Dumlupınar Üniversitesi Bilgisayar Mühendisliği gt; gt; http://cakirhal.blogspot.com gt; gt; -- M. Raşit ÖZDAŞ Halil İbrahim ÇAKIR Dumlupınar Üniversitesi Bilgisayar Mühendisliği http://cakirhal.blogspot.com -- M. Raşit ÖZDAŞ
Ynt: Re: Ynt: Re: Cannot access Jobtracker and namenode
Sorry the log file( hadoop-root-namenode-debian.log ) content: 2009-04-12 16:27:22,762 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG:nbsp;nbsp; host = debian/127.0.1.1 STARTUP_MSG:nbsp;nbsp; args = [] STARTUP_MSG:nbsp;nbsp; version = 0.19.0 STARTUP_MSG:nbsp;nbsp; build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008 / 2009-04-12 16:27:22,943 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2009-04-12 16:27:22,954 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:9000 2009-04-12 16:27:22,958 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-04-12 16:27:22,964 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-04-12 16:27:23,037 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root 2009-04-12 16:27:23,038 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-04-12 16:27:23,038 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-04-12 16:27:23,048 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-04-12 16:27:23,050 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-04-12 16:27:23,096 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 0 2009-04-12 16:27:23,096 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2009-04-12 16:27:23,096 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 seconds. 2009-04-12 16:27:23,096 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds. 2009-04-12 16:27:23,127 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 seconds. 2009-04-12 16:27:23,150 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 134 msecs 2009-04-12 16:27:23,152 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2009-04-12 16:27:23,152 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2009-04-12 16:27:23,152 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0 2009-04-12 16:27:23,152 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number ofnbsp; over-replicated blocks = 0 2009-04-12 16:27:23,152 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs. 2009-04-12 16:27:23,153 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2009-04-12 16:27:23,153 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2009-04-12 16:27:36,531 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2009-04-12 16:27:36,545 INFO org.mortbay.util.Credential: Checking Resource aliases 2009-04-12 16:27:37,029 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@6fa9fc 2009-04-12 16:27:37,085 INFO org.mortbay.util.Container: Started WebApplicationContext[/static,/static] 2009-04-12 16:27:37,181 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@56860b 2009-04-12 16:27:37,182 INFO org.mortbay.util.Container: Started WebApplicationContext[/logs,/logs] 2009-04-12 16:27:37,343 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@16614e7 2009-04-12 16:27:37,353 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2009-04-12 16:27:37,356 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2009-04-12 16:27:37,357 INFO org.mortbay.util.Container: Started org.mortbay.jetty.ser...@dda25b 2009-04-12 16:27:37,357 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Web-server up at: 0.0.0.0:50070 2009-04-12 16:27:37,388 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2009-04-12 16:27:37,388 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2009-04-12 16:27:37,391 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2009-04-12 16:27:37,392 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2009-04-12 16:27:37,392 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2009-04-12 16:27:37,392 INFO org.apache.hadoop.ipc.Server: IPC Server
Re: Interesting Hadoop/FUSE-DFS access patterns
On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon t...@cloudera.com wrote: Hey Brian, This is really interesting stuff. I'm curious - have you tried these same experiments using the Java API? I'm wondering whether this is FUSE-specific or inherent to all HDFS reads. I'll try to reproduce this over here as well. I just tried this on a localhost single-node cluster with the following test program: import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.conf.Configuration; import java.io.IOException; import java.net.URI; public class Test { public static void main(String[] args) throws Exception { URI uri = new URI(hdfs://localhost:8020/); FileSystem fs = FileSystem.get(uri, new Configuration()); Path path = new Path(/testfile); FSDataInputStream dis = fs.open(path); for (int size=0; size 1024*1024; size += 4096) { for (int i = 0; i 100; i++) { long st = System.currentTimeMillis(); byte buf[] = new byte[size]; dis.read(0, buf, 0, size); long et = System.currentTimeMillis(); System.out.println(String.valueOf(size) + \t + String.valueOf(et - st)); } } fs.close(); } } I didn't see the same behavior as you're reporting. Can you give this a try on your cluster and see if it shows the 128K jump? -Todd
Doubt regarding permissions
Hey, I tried the following : - created a dir temp for user A and permission 733 - created a dir temp/test for user B and permission 722 - - created a file temp/test/test.txt for user B and permission 722 Now in HDFS, user A can list as well as read the contents of file temp/test/test.txt while on my RHEL box I cant. Is it a feature or a bug. Can someone please try this out and confirm? Thanks Amar
Re: Reduce task attempt retry strategy
Currently, only failed tasks are attempted on a node other than the one where it failed. For killed tasks, there is no such policy for retries. failed to report status usually indicates that the task did not report sufficient progress. However, it is possible that the task itself was not progressing fast enough because the machine where it ran had problems. On 4/8/09 12:33 AM, Stefan Will stefan.w...@gmx.net wrote: My cluster has 27 nodes with a total reduce task capacity of 54. The job had 31 reducers. I actually had a task today that showed the behavior you're describing: 3 tries on one machine, and then the 4th on a different one. As for the particular job I was talking about before: Here are the stats for the job: KindTotal Tasks(successful+failed+killed)Successful tasksFailed tasksKilled tasksStart TimeFinish Time Setup 1 1 0 0 4-Apr-2009 00:30:16 4-Apr-2009 00:30:33 (17sec) Map 64 49 12 3 4-Apr-2009 00:30:33 4-Apr-2009 01:11:15 (40mins, 41sec) Reduce 34 30 4 0 4-Apr-2009 00:30:44 4-Apr-2009 04:31:36 (4hrs, 52sec) Cleanup 4 0 4 0 4-Apr-2009 04:31:36 4-Apr-2009 06:32:00 (2hrs, 24sec) Not sure what to look for in the jobtracker log. All it shows for that particular failed task is that it assigned it to the same machine 4 times and then eventually failed. Perhaps something to note is that the 4 failures were all due to timeouts: Task attempt_200904031942_0002_r_13_3 failed to report status for 1802 seconds. Killing! Also, looking at the logs, there was a map task too that was retried on that particuar box 4 times without going to a different one. Perhaps it had something to do with the way this machine failed: The jobtracker still considered it live, while all actual tasks assigned to it timed out. -- Stefan From: Amar Kamat ama...@yahoo-inc.com Reply-To: core-user@hadoop.apache.org Date: Tue, 07 Apr 2009 10:05:16 +0530 To: core-user@hadoop.apache.org Subject: Re: Reduce task attempt retry strategy Stefan Will wrote: Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. What is your cluster size? If a task fails on a machine then its re-tried on some other machine (based on number of good machines left in the cluster). After certain number of failures, the machine will be blacklisted (again based on number of machine left in the cluster). 3 different reducers might be scheduled on that machine but that should not lead to job failure. Can you explain in detail what exactly happened. Find out where the attempts got scheduled from the jobtracker's log. Amar Perhaps I¹m confusing this with the block placement strategy in hdfs, but I always thought that the framework would retry jobs on a different machine if retries on the original machine keep failing. E.g. I would have expected to retry once or twice on the same machine, but then switch to a different one to minimize the likelihood of getting stuck on a bad machine. What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for improving on this in the future ? Thanks, Stefan
Re: HDFS as a logfile ??
Chukwa is a Hadoop subproject aiming to do something similar, though particularly for the case of Hadoop logs. You may find it useful. Hadoop unfortunately does not support concurrent appends. As a result, the Chukwa project found itself creating a whole new demon, the chukwa collector, precisely to merge the event streams and write it out, just once. We're set to do a release within the next week or two, but in the meantime you can check it out from SVN at https://svn.apache.org/repos/asf/hadoop/chukwa/trunk --Ari On Fri, Apr 10, 2009 at 12:06 AM, Ricky Ho r...@adobe.com wrote: I want to analyze the traffic pattern and statistics of a distributed application. I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel. Is this a good approach ? In this case, does HDFS support concurrent write (append) to a file ? Another question is whether the write API thread-safe ? Rgds, Ricky -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department
Re: Modeling WordCount in a different way
Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Pankil On Wed, Apr 8, 2009 at 12:51 AM, Sharad Agarwal shara...@yahoo-inc.comwrote: I have confusion how would I start the next job after finishing the one, could you just make it clear by some rough example. See JobControl class to chain the jobs. You can specify dependencies as well. You can checkout the TestJobControl class for example code. Also do I need to use SequenceFileInputFormat to maintain the results in the memory and then accessing it. Not really. You have to use the corresponding reader to read the data. For example if you have written it using TextOutputFormat(default), you can then read it using TextInputFormat. The reader can be created in the reducer initialization code. In the new api (org.apache.hadoop.mapreduce.Reducer) it can be done in setup method. Here you can load the word,count mappings in a HashMap. In case you don't want to load all data in memory, you can create the reader in setup method and keep on doing the next (LineRecordReader#nextKeyValue()) in the reduce function if the reduce key is greater than the current key from the reader. - Sharad
Re: Multithreaded Reducer
On Apr 10, 2009, at 11:12 AM, Sagar Naik wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. You'd probably want to make a blocking concurrent queue of the key value pairs that are given to the reducer. Then have a pool of reducers that pull from the queue. It can be modeled on the multi- threaded map runner. Do be aware, that you'll need to clone the keys and values that are given to the reduce. -- Owen
RE: HDFS as a logfile ??
Ari, thanks for your note. Like to understand more how Chukwa group log entries ... If I have appA running in machine X, Y and appB running in machine Y, Z. Each of them calling the Chukwa log API. Do I have all entries going in the same HDFS file ? or 4 separated HDFS files based on the App/Machine combination ? If the answer of first Q is yes, then what if appA and appB has different format of log entries ? If the answer of second Q is yes, then are all these HDFS files cut at the same time boundary ? Looks like in Chukwa, application first log to a daemon, which buffer-write the log entries into a local file. And there is a separate process to ship these data to a remote collector daemon which issue the actual HDFS write. I observe the following overhead ... 1) The overhead of extra write to local disk and ship the data over to the collector. If HDFS supports append, the application can then go directly to the HDFS 2) The centralized collector establish a bottleneck to the otherwise perfectly parallel HDFS architecture. Am I missing something here ? Rgds, Ricky -Original Message- From: Ariel Rabkin [mailto:asrab...@gmail.com] Sent: Monday, April 13, 2009 7:38 AM To: core-user@hadoop.apache.org Subject: Re: HDFS as a logfile ?? Chukwa is a Hadoop subproject aiming to do something similar, though particularly for the case of Hadoop logs. You may find it useful. Hadoop unfortunately does not support concurrent appends. As a result, the Chukwa project found itself creating a whole new demon, the chukwa collector, precisely to merge the event streams and write it out, just once. We're set to do a release within the next week or two, but in the meantime you can check it out from SVN at https://svn.apache.org/repos/asf/hadoop/chukwa/trunk --Ari On Fri, Apr 10, 2009 at 12:06 AM, Ricky Ho r...@adobe.com wrote: I want to analyze the traffic pattern and statistics of a distributed application. I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel. Is this a good approach ? In this case, does HDFS support concurrent write (append) to a file ? Another question is whether the write API thread-safe ? Rgds, Ricky -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department
Re: Interesting Hadoop/FUSE-DFS access patterns
Hey Todd, Been playing more this morning after thinking about it for the night -- I think the culprit is not the network, but actually the cache. Here's the output of your script adjusted to do the same calls as I was doing (you had left out the random I/O part). [br...@red tmp]$ java hdfs_tester Mean value for reads of size 0: 0.0447 Mean value for reads of size 16384: 10.4872 Mean value for reads of size 32768: 10.82925 Mean value for reads of size 49152: 6.2417 Mean value for reads of size 65536: 7.0511003 Mean value for reads of size 81920: 9.411599 Mean value for reads of size 98304: 9.378799 Mean value for reads of size 114688: 8.99065 Mean value for reads of size 131072: 5.1378503 Mean value for reads of size 147456: 6.1324 Mean value for reads of size 163840: 17.1187 Mean value for reads of size 180224: 6.5492 Mean value for reads of size 196608: 8.45695 Mean value for reads of size 212992: 7.4292 Mean value for reads of size 229376: 10.7843 Mean value for reads of size 245760: 9.29095 Mean value for reads of size 262144: 6.57865 Copy of the script below. So, without the FUSE layer, we don't see much (if any) patterns here. The overhead of randomly skipping through the file is higher than the overhead of reading out the data. Upon further inspection, the biggest factor affecting the FUSE layer is actually the Linux VFS caching -- if you notice, the bandwidth in the given graph for larger read sizes is *higher* than 1Gbps, which is the limit of the network on that particular node. If I go in the opposite direction - starting with the largest reads first, then going down to the smallest reads, the graph entirely smooths out for the small values - everything is read from the filesystem cache in the client RAM. Graph attached. So, on the upside, mounting through FUSE gives us the opportunity to speed up reads for very complex, non-sequential patterns - for free, thanks to the hardworking Linux kernel. On the downside, it's incredibly difficult to come up with simple cases to demonstrate performance for an application -- the cache performance and size depends on how much activity there's on the client, the previous file system activity that the application did, and the amount of concurrent activity on the server. I can give you results for performance, but it's not going to be the performance you see in real life. (Gee, if only file systems were easy...) Ok, sorry for the list noise -- it seems I'm going to have to think more about this problem before I can come up with something coherent. Brian import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.conf.Configuration; import java.io.IOException; import java.net.URI; import java.util.Random; public class hdfs_tester { public static void main(String[] args) throws Exception { URI uri = new URI(hdfs://hadoop-name:9000/); FileSystem fs = FileSystem.get(uri, new Configuration()); Path path = new Path(/user/uscms01/pnfs/unl.edu/data4/cms/store/ phedex_monarctest/Nebraska/LoadTest07_Nebraska_33); FSDataInputStream dis = fs.open(path); Random rand = new Random(); FileStatus status = fs.getFileStatus(path); long file_len = status.getLen(); int iters = 20; for (int size=0; size 1024*1024; size += 4*4096) { long csum = 0; for (int i = 0; i iters; i++) { int pos = rand.nextInt((int)((file_len-size-1)/8))*8; byte buf[] = new byte[size]; if (pos 0) pos = 0; long st = System.nanoTime(); dis.read(pos, buf, 0, size); long et = System.nanoTime(); csum += et-st; //System.out.println(String.valueOf(size) + \t + String.valueOf(pos) + \t + String.valueOf(et - st)); } float csum2 = csum; csum2 /= iters; System.out.println(Mean value for reads of size + size + : + (csum2/1000/1000)); } fs.close(); } } On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote: On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon t...@cloudera.com wrote: Hey Brian, This is really interesting stuff. I'm curious - have you tried these same experiments using the Java API? I'm wondering whether this is FUSE- specific or inherent to all HDFS reads. I'll try to reproduce this over here as well. This smells sort of nagle-related to me... if you get a chance, you may want to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 * 1024, and see if the magic number jumps up to 256KB. If so, I think it should be a pretty easy bugfix. Oops - spoke too fast there... looks like TCP_WINDOW_SIZE isn't actually used for any socket configuration, so I don't think that will make a difference... still think networking might be the culprit, though. -Todd On Sun, Apr 12, 2009 at 9:41 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Ok, here's something perhaps
Re: Extending ClusterMapReduceTestCase
Hey all, I'm also extending the ClusterMapReduceTestCase and having a bit of trouble as well. Currently I'm getting : Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2 Starting DataNode 1 with dfs.data.dir: build/test/data/dfs/data/data3,build/test/data/dfs/data/data4 Generating rack names for tasktrackers Generating host names for tasktrackers And then nothing... just spins on that forever. Any ideas? I have all the jetty and jetty-ext libs in the classpath and I set the hadoop.log.dir and the SAX parser correctly. This is all I have for my test class so far, I'm not even doing anything yet: public class TestDoop extends ClusterMapReduceTestCase { @Test public void testDoop() throws Exception { System.setProperty(hadoop.log.dir, ~/test-logs); System.setProperty(javax.xml.parsers.SAXParserFactory, com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl); setUp(); System.out.println(done.); } Thanks! bc -- View this message in context: http://www.nabble.com/Extending-ClusterMapReduceTestCase-tp22440254p23024043.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Extending ClusterMapReduceTestCase
Sry, I forgot to include the not-IntelliJ-console output :) 09/04/13 12:07:14 ERROR mapred.MiniMRCluster: Job tracker crashed java.lang.NullPointerException at java.io.File.init(File.java:222) at org.apache.hadoop.mapred.JobHistory.init(JobHistory.java:143) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1110) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:143) at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:96) at java.lang.Thread.run(Thread.java:637) I managed to pick up the chapter in the Hadoop Book that Jason mentions that deals with Unit testing (great chapter btw) and it looks like everything is in order. He points out that this error is typically caused by a bad hadoop.log.dir or missing log4j.properties, but I verified that my dir is ok and my hadoop-0.19.1-core.jar has the log4j.properties in it. I also tried running the same test with hadoop-core/test 0.19.0 - same thing. Thanks again, bc czero wrote: Hey all, I'm also extending the ClusterMapReduceTestCase and having a bit of trouble as well. Currently I'm getting : Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2 Starting DataNode 1 with dfs.data.dir: build/test/data/dfs/data/data3,build/test/data/dfs/data/data4 Generating rack names for tasktrackers Generating host names for tasktrackers And then nothing... just spins on that forever. Any ideas? I have all the jetty and jetty-ext libs in the classpath and I set the hadoop.log.dir and the SAX parser correctly. This is all I have for my test class so far, I'm not even doing anything yet: public class TestDoop extends ClusterMapReduceTestCase { @Test public void testDoop() throws Exception { System.setProperty(hadoop.log.dir, ~/test-logs); System.setProperty(javax.xml.parsers.SAXParserFactory, com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl); setUp(); System.out.println(done.); } Thanks! bc -- View this message in context: http://www.nabble.com/Extending-ClusterMapReduceTestCase-tp22440254p23024597.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: DataXceiver Errors in 0.19.1
It need not be anything to worry about. Do you see anything at user level (task, job, copy, or script) fail because of this? On a distributed system with many nodes, there would be some errors on some of the nodes for various reasons (load, hardware, reboot, etc). HDFS usually should work around it (because of multiple replicas). In this particular case, client is trying to write some data and one of the DataNodes writing a replica might have gone down. HDFS should recover from it and write to rest of the nodes. Please check if the write actually succeeded. Raghu. Tamir Kamara wrote: Hi, I've recently upgraded to 0.19.1 and now there're some DataXceiver errors in the datanodes logs. There're also messages about interruption while waiting for IO. Both messages are below. Can I do something to fix it ? Thanks, Tamir 2009-04-13 09:57:20,334 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.14.3:50010, storageID=DS-727246419-127.0.0.1-50010-1234873914501, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 65557 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Unknown Source) 2009-04-13 09:57:20,333 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_8486030874928774495_54856 1 Exception java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/192.168.14.3:50439 remote=/192.168.14.7:50010]. 58972 millis timeout left. at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.DataInputStream.readFully(Unknown Source) at java.io.DataInputStream.readLong(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853) at java.lang.Thread.run(Unknown Source)
Map Rendering
We're looking into power grid visualization and were wondering if anyone could recommend a good java native lib (that plays nice with hadoop) to render some layers of geospatial data. At this point we have the cluster crunching our test data, formats, and data structures, and we're now looking at producing indexes and visualizations. We'd like to be able to watch the power grid over time (with a 'time slider') over the map and load the tiles in OpenLayers, OpenStreetMap, or VirtualEarth, so the engineers could go back and replay large amounts of high resolution PMU smart grid data, then zoom in/out, and use the time slider to replay it. So essentially we'll need to render the grid graph as a layer in tiles, and then each tile (at each level) through time. I'm hoping someone has done some work with hadoop and map tile generation and can save me some time in finding the right java lib. Suggestions on java lib for this? Josh Patterson TVA
Re: Reduce task attempt retry strategy
Jothi, thanks for the explanation. One question though: why shouldn't timed out tasks be retried on a different machine ? As you pointed out, it could very well have been due to the machine having problems. To me a timeout is just like any other kind of failure. -- Stefan From: Jothi Padmanabhan joth...@yahoo-inc.com Reply-To: core-user@hadoop.apache.org Date: Mon, 13 Apr 2009 19:00:38 +0530 To: core-user@hadoop.apache.org Subject: Re: Reduce task attempt retry strategy Currently, only failed tasks are attempted on a node other than the one where it failed. For killed tasks, there is no such policy for retries. failed to report status usually indicates that the task did not report sufficient progress. However, it is possible that the task itself was not progressing fast enough because the machine where it ran had problems. On 4/8/09 12:33 AM, Stefan Will stefan.w...@gmx.net wrote: My cluster has 27 nodes with a total reduce task capacity of 54. The job had 31 reducers. I actually had a task today that showed the behavior you're describing: 3 tries on one machine, and then the 4th on a different one. As for the particular job I was talking about before: Here are the stats for the job: KindTotal Tasks(successful+failed+killed)Successful tasksFailed tasksKilled tasksStart TimeFinish Time Setup 1 1 0 0 4-Apr-2009 00:30:16 4-Apr-2009 00:30:33 (17sec) Map 64 49 12 3 4-Apr-2009 00:30:33 4-Apr-2009 01:11:15 (40mins, 41sec) Reduce 34 30 4 0 4-Apr-2009 00:30:44 4-Apr-2009 04:31:36 (4hrs, 52sec) Cleanup 4 0 4 0 4-Apr-2009 04:31:36 4-Apr-2009 06:32:00 (2hrs, 24sec) Not sure what to look for in the jobtracker log. All it shows for that particular failed task is that it assigned it to the same machine 4 times and then eventually failed. Perhaps something to note is that the 4 failures were all due to timeouts: Task attempt_200904031942_0002_r_13_3 failed to report status for 1802 seconds. Killing! Also, looking at the logs, there was a map task too that was retried on that particuar box 4 times without going to a different one. Perhaps it had something to do with the way this machine failed: The jobtracker still considered it live, while all actual tasks assigned to it timed out. -- Stefan From: Amar Kamat ama...@yahoo-inc.com Reply-To: core-user@hadoop.apache.org Date: Tue, 07 Apr 2009 10:05:16 +0530 To: core-user@hadoop.apache.org Subject: Re: Reduce task attempt retry strategy Stefan Will wrote: Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. What is your cluster size? If a task fails on a machine then its re-tried on some other machine (based on number of good machines left in the cluster). After certain number of failures, the machine will be blacklisted (again based on number of machine left in the cluster). 3 different reducers might be scheduled on that machine but that should not lead to job failure. Can you explain in detail what exactly happened. Find out where the attempts got scheduled from the jobtracker's log. Amar Perhaps I¹m confusing this with the block placement strategy in hdfs, but I always thought that the framework would retry jobs on a different machine if retries on the original machine keep failing. E.g. I would have expected to retry once or twice on the same machine, but then switch to a different one to minimize the likelihood of getting stuck on a bad machine. What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for improving on this in the future ? Thanks, Stefan
raw files become zero bytes when mapreduce job hit outofmemory error
I'm running some mapreduce, and some jobs has outofmemory errors, and I find that that the raw data itself also got corrupted, becomes zero bytes, very strange to me, I did not look very detail into it, but just want to check quickly with someone with such experience. I'm running at 0.18.3. thanks
Re: Doubt regarding permissions
Hi Amar, I just have tried. Everything worked as expected. I guess user A in your experiment was a superuser so that he could read anything. Nicholas Sze /// permission testing // drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55 /temp drwx-w--w- - tsz supergroup 0 2009-04-13 10:58 /temp/test -rw-r--r-- 3 tsz supergroup 1366 2009-04-13 10:58 /temp/test/r.txt //login as nicholas (non-superuser) $ whoami nicholas $ ./bin/hadoop fs -lsr /temp drwx-w--w- - tsz supergroup 0 2009-04-13 10:58 /temp/test lsr: could not get get listing for 'hdfs://:9000/temp/test' : org.apache.hadoop.security.AccessControlException: Permission denied: user=nicholas, access=READ_EXECUTE, inode=test:tsz:supergroup:rwx-w--w- $ ./bin/hadoop fs -cat /temp/test/r.txt cat: org.apache.hadoop.security.AccessControlException: Permission denied: user=nicholas, access=EXECUTE, inode=test:tsz:supergroup:rwx-w--w- - Original Message From: Amar Kamat ama...@yahoo-inc.com To: core-user@hadoop.apache.org Sent: Monday, April 13, 2009 2:02:24 AM Subject: Doubt regarding permissions Hey, I tried the following : - created a dir temp for user A and permission 733 - created a dir temp/test for user B and permission 722 - - created a file temp/test/test.txt for user B and permission722 Now in HDFS, user A can list as well as read the contents of file temp/test/test.txt while on my RHEL box I cant. Is it a feature or a bug. Can someone please try this out and confirm? Thanks Amar
Re: Map-Reduce Slow Down
in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data.. Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra mnage...@asu.edu wrote: Aaron That could be the issue, my data is just 516MB - wouldn't this see a bit of speed up? Could you guide me to the example? I ll run my cluster on it and see what I get. Also for my program I had a java timer running to record the time taken to complete execution. Does Hadoop have an inbuilt timer? Mithila On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball aa...@cloudera.com wrote: Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra mnage...@asu.edu wrote: Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized). Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University
Grouping Values for Reducer Input
Hi Everyone, I'm working on a relatively simple MapReduce job with a slight complication with regards to the ordering of my key/values heading into the reducer. The output from the mapper might be something like cat - doc5, 1 cat - doc1, 1 cat - doc5, 3 ... Here, 'cat' is my key and the value is the document ID and the count (my own WritableComparable.) Originally I was going to create a HashMap in the reduce method and add an entry for each document ID and sum the counts for each. I realized the method would be better if the values were in order like so: cat - doc1, 1 cat - doc5, 1 cat - doc5, 3 ... Using this style I can continue summing until I reach a new document ID and just collect the output at this point thus avoiding data structures and object creation costs. I tried setting JobConf.setOutputValueGroupingComparator() but this didn't seem to do anything. In fact, I threw an exception from the Comparator I supplied but this never showed up when running the job. My map output value consists of a UTF and a Long so perhaps the Comparator I'm using (identical to Text.Comparator) is incorrect: public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int n1 = WritableUtils.decodeVIntSize(b1[s1]); int n2 = WritableUtils.decodeVIntSize(b2[s2]); return compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2); } In my final output I'm basically running into the same word - documentID being output multiple times. So for the above example I have multiple lines with cat - doc5, X. Reducer method just in case: public void reduce(Text key, IteratorTermFrequencyWritable values, OutputCollectorText, TermFrequencyWritable output, Reporter reporter) throws IOException { long sum = 0; String lastDocID = null; // Iterate through all values while(values.hasNext()) { TermFrequencyWritable value = values.next(); // Encountered new document ID = record and reset if(!value.getDocumentID().equals(lastDocID)) { // Ignore first go through if(sum != 0) { termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } sum = 0; lastDocID = value.getDocumentID(); } sum += value.getFrequency(); } // Record last one termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } Any ideas (Using Hadoop .19.1)? Thanks, - Bill smime.p7s Description: S/MIME cryptographic signature
Re: Map-Reduce Slow Down
Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be running at all. Generally, one map job is assigned to each input split and if you are running your cluster with the defaults, the splits are 64 MB each. I got confused when you said the Namenode seemed to do all the work. Can you check conf/slaves and make sure you put the names of all task trackers there? I also suggest comparing both clusters with a larger input size, say at least 5 GB, to really see a difference. Jim On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball aa...@cloudera.com wrote: in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data.. Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra mnage...@asu.edu wrote: Aaron That could be the issue, my data is just 516MB - wouldn't this see a bit of speed up? Could you guide me to the example? I ll run my cluster on it and see what I get. Also for my program I had a java timer running to record the time taken to complete execution. Does Hadoop have an inbuilt timer? Mithila On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball aa...@cloudera.com wrote: Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra mnage...@asu.edu wrote: Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized). Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University
RE: Grouping Values for Reducer Input
I'm not familiar with setOutputValueGroupingComparator what about adding the doc# in the key and have your own hashing/Partitioner? so doing something like cat_doc5- 1 cat_doc1- 1 cat_doc5- 3 the hashing method would take everything before _ as the hash. the shuffling would still put the catxxx keys together using your hashing but sort them like you need. cat_doc5-1 cat_doc5-3 cat_doc1-1 then the reduce task can count for each doc# in a cat From: Streckfus, William [USA] [mailto:streckfus_will...@bah.com] Sent: Monday, April 13, 2009 2:53 PM To: core-user@hadoop.apache.org Subject: Grouping Values for Reducer Input Hi Everyone, I'm working on a relatively simple MapReduce job with a slight complication with regards to the ordering of my key/values heading into the reducer. The output from the mapper might be something like cat - doc5, 1 cat - doc1, 1 cat - doc5, 3 ... Here, 'cat' is my key and the value is the document ID and the count (my own WritableComparable.) Originally I was going to create a HashMap in the reduce method and add an entry for each document ID and sum the counts for each. I realized the method would be better if the values were in order like so: cat - doc1, 1 cat - doc5, 1 cat - doc5, 3 ... Using this style I can continue summing until I reach a new document ID and just collect the output at this point thus avoiding data structures and object creation costs. I tried setting JobConf.setOutputValueGroupingComparator() but this didn't seem to do anything. In fact, I threw an exception from the Comparator I supplied but this never showed up when running the job. My map output value consists of a UTF and a Long so perhaps the Comparator I'm using (identical to Text.Comparator) is incorrect: public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int n1 = WritableUtils.decodeVIntSize(b1[s1]); int n2 = WritableUtils.decodeVIntSize(b2[s2]); return compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2); } In my final output I'm basically running into the same word - documentID being output multiple times. So for the above example I have multiple lines with cat - doc5, X. Reducer method just in case: public void reduce(Text key, IteratorTermFrequencyWritable values, OutputCollectorText, TermFrequencyWritable output, Reporter reporter) throws IOException { long sum = 0; String lastDocID = null; // Iterate through all values while(values.hasNext()) { TermFrequencyWritable value = values.next(); // Encountered new document ID = record and reset if(!value.getDocumentID().equals(lastDocID)) { // Ignore first go through if(sum != 0) { termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } sum = 0; lastDocID = value.getDocumentID(); } sum += value.getFrequency(); } // Record last one termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } Any ideas (Using Hadoop .19.1)? Thanks, - Bill
Re: Grouping Values for Reducer Input
I'm not sure if this is exactly what you want but, can you emit map records as: cat, doc5 - 3 cat, doc1 - 1 cat, doc5 - 1 and so on.. This way, your reducers will get the intermediate key,value pairs as cat, doc5 - 3 cat, doc5 - 1 cat, doc1 - 1 then you can split the keys (cat, doc*) inside the reducer and perform your additions. -Jim On Mon, Apr 13, 2009 at 4:53 PM, Streckfus, William [USA] streckfus_will...@bah.com wrote: Hi Everyone, I'm working on a relatively simple MapReduce job with a slight complication with regards to the ordering of my key/values heading into the reducer. The output from the mapper might be something like cat - doc5, 1 cat - doc1, 1 cat - doc5, 3 ... Here, 'cat' is my key and the value is the document ID and the count (my own WritableComparable.) Originally I was going to create a HashMap in the reduce method and add an entry for each document ID and sum the counts for each. I realized the method would be better if the values were in order like so: cat - doc1, 1 cat - doc5, 1 cat - doc5, 3 ... Using this style I can continue summing until I reach a new document ID and just collect the output at this point thus avoiding data structures and object creation costs. I tried setting JobConf.setOutputValueGroupingComparator() but this didn't seem to do anything. In fact, I threw an exception from the Comparator I supplied but this never showed up when running the job. My map output value consists of a UTF and a Long so perhaps the Comparator I'm using (identical to Text.Comparator) is incorrect: *public* *int* compare(*byte*[] b1, *int* s1, *int* l1, *byte*[] b2, *int*s2, *int* l2) { *int* n1 = WritableUtils.*decodeVIntSize*(b1[s1]); *int* n2 = WritableUtils.*decodeVIntSize*(b2[s2]); *return* *compareBytes*(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2); } In my final output I'm basically running into the same word - documentID being output multiple times. So for the above example I have multiple lines with cat - doc5, X. Reducer method just in case: *public* *void* reduce(Text key, IteratorTermFrequencyWritable values, OutputCollectorText, TermFrequencyWritable output, Reporter reporter) * throws* IOException { *long* sum = 0; String lastDocID = *null*; // Iterate through all values *while*(values.hasNext()) { TermFrequencyWritable value = values.next(); // Encountered new document ID = record and reset *if*(!value.getDocumentID().equals(lastDocID)) { // Ignore first go through *if*(sum != 0) { termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } sum = 0; lastDocID = value.getDocumentID(); } sum += value.getFrequency(); } // Record last one termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } Any ideas (Using Hadoop .19.1)? Thanks, - Bill
Re: Grouping Values for Reducer Input
Oh, I forgot to tell that you should change your partitioner to send all the keys in the form of cat,* to the same reducer but it seems like Jeremy has been much faster than me :) -Jim On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky jim.twen...@gmail.com wrote: I'm not sure if this is exactly what you want but, can you emit map records as: cat, doc5 - 3 cat, doc1 - 1 cat, doc5 - 1 and so on.. This way, your reducers will get the intermediate key,value pairs as cat, doc5 - 3 cat, doc5 - 1 cat, doc1 - 1 then you can split the keys (cat, doc*) inside the reducer and perform your additions. -Jim On Mon, Apr 13, 2009 at 4:53 PM, Streckfus, William [USA] streckfus_will...@bah.com wrote: Hi Everyone, I'm working on a relatively simple MapReduce job with a slight complication with regards to the ordering of my key/values heading into the reducer. The output from the mapper might be something like cat - doc5, 1 cat - doc1, 1 cat - doc5, 3 ... Here, 'cat' is my key and the value is the document ID and the count (my own WritableComparable.) Originally I was going to create a HashMap in the reduce method and add an entry for each document ID and sum the counts for each. I realized the method would be better if the values were in order like so: cat - doc1, 1 cat - doc5, 1 cat - doc5, 3 ... Using this style I can continue summing until I reach a new document ID and just collect the output at this point thus avoiding data structures and object creation costs. I tried setting JobConf.setOutputValueGroupingComparator() but this didn't seem to do anything. In fact, I threw an exception from the Comparator I supplied but this never showed up when running the job. My map output value consists of a UTF and a Long so perhaps the Comparator I'm using (identical to Text.Comparator) is incorrect: *public* *int* compare(*byte*[] b1, *int* s1, *int* l1, *byte*[] b2, *int * s2, *int* l2) { *int* n1 = WritableUtils.*decodeVIntSize*(b1[s1]); *int* n2 = WritableUtils.*decodeVIntSize*(b2[s2]); *return* *compareBytes*(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2); } In my final output I'm basically running into the same word - documentID being output multiple times. So for the above example I have multiple lines with cat - doc5, X. Reducer method just in case: *public* *void* reduce(Text key, IteratorTermFrequencyWritable values, OutputCollectorText, TermFrequencyWritable output, Reporter reporter) * throws* IOException { *long* sum = 0; String lastDocID = *null*; // Iterate through all values *while*(values.hasNext()) { TermFrequencyWritable value = values.next(); // Encountered new document ID = record and reset *if*(!value.getDocumentID().equals(lastDocID)) { // Ignore first go through *if*(sum != 0) { termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } sum = 0; lastDocID = value.getDocumentID(); } sum += value.getFrequency(); } // Record last one termFrequency.setDocumentID(lastDocID); termFrequency.setFrequency(sum); output.collect(key, termFrequency); } Any ideas (Using Hadoop .19.1)? Thanks, - Bill
[ANNOUNCE] hamake-1.0
HAMAKE is make-like utility for Hadoop. More information at the project page: http://code.google.com/p/hamake/ Documentation is still quite poor, but core functionality is working and I plan on improving it further. Sincerely, Vadim
Re: Map-Reduce Slow Down
Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was accessed at port 54310. The new cluster which I ve setup has Red Hat Linux release 7.2 (Enigma)running on it. Now when I try to access the dfs from one of the slaves i get the following response: dfs cannot be accessed. When I access the DFS throught the master there s no problem. So I feel there a problem with the port. Any ideas? I did check the list of slaves, it looks fine to me. Mithila On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky jim.twen...@gmail.com wrote: Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be running at all. Generally, one map job is assigned to each input split and if you are running your cluster with the defaults, the splits are 64 MB each. I got confused when you said the Namenode seemed to do all the work. Can you check conf/slaves and make sure you put the names of all task trackers there? I also suggest comparing both clusters with a larger input size, say at least 5 GB, to really see a difference. Jim On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball aa...@cloudera.com wrote: in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data.. Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra mnage...@asu.edu wrote: Aaron That could be the issue, my data is just 516MB - wouldn't this see a bit of speed up? Could you guide me to the example? I ll run my cluster on it and see what I get. Also for my program I had a java timer running to record the time taken to complete execution. Does Hadoop have an inbuilt timer? Mithila On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball aa...@cloudera.com wrote: Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra mnage...@asu.edu wrote: Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized). Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University
Re: Map-Reduce Slow Down
Can you ssh between the nodes? -jim On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu wrote: Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was accessed at port 54310. The new cluster which I ve setup has Red Hat Linux release 7.2 (Enigma)running on it. Now when I try to access the dfs from one of the slaves i get the following response: dfs cannot be accessed. When I access the DFS throught the master there s no problem. So I feel there a problem with the port. Any ideas? I did check the list of slaves, it looks fine to me. Mithila On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky jim.twen...@gmail.com wrote: Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be running at all. Generally, one map job is assigned to each input split and if you are running your cluster with the defaults, the splits are 64 MB each. I got confused when you said the Namenode seemed to do all the work. Can you check conf/slaves and make sure you put the names of all task trackers there? I also suggest comparing both clusters with a larger input size, say at least 5 GB, to really see a difference. Jim On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball aa...@cloudera.com wrote: in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data.. Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra mnage...@asu.edu wrote: Aaron That could be the issue, my data is just 516MB - wouldn't this see a bit of speed up? Could you guide me to the example? I ll run my cluster on it and see what I get. Also for my program I had a java timer running to record the time taken to complete execution. Does Hadoop have an inbuilt timer? Mithila On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball aa...@cloudera.com wrote: Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra mnage...@asu.edu wrote: Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized). Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University
Using 3rd party Api in Map class
Hello, I am trying to use Pellet library for some OWL inferencing in my mapper class. But I can't find a way to bundle the library jar files in my job jar file. I am exporting my project as a jar file from Eclipse IDE. Will it work if I create the jar manually and include all the jar files Pellet library has? Is there any simpler way to include 3rd party library jar files in a hadoop job jar? Without being able to include the library jars I am getting ClassNotFoundException. Thanks, Farhan
Re: Using 3rd party Api in Map class
create a directroy call 'lib' in your project's root dir, then put all the 3rd party jar in it. 2009/4/14 Farhan Husain russ...@gmail.com Hello, I am trying to use Pellet library for some OWL inferencing in my mapper class. But I can't find a way to bundle the library jar files in my job jar file. I am exporting my project as a jar file from Eclipse IDE. Will it work if I create the jar manually and include all the jar files Pellet library has? Is there any simpler way to include 3rd party library jar files in a hadoop job jar? Without being able to include the library jars I am getting ClassNotFoundException. Thanks, Farhan -- http://daily.appspot.com/food/
java compile time warning while using MultipleOutputs
Hello, Java compiler generates the following warning when I use MultipleOutputs class as (MultipleOutpus type) mos.getCollector(String, Reporter) method returns OutputCollector instead of OutputCollector(K,V). warning: [unchecked] unchecked call to collect(K,V) as a member of the raw type org.apache.hadoop.mapred.OutputCollector Yes, I can live with this warning, but it really makes me uneasy. Any suggestions to remove this warning? -seunghwa
Re: Map-Reduce Slow Down
Yes I can.. On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky jim.twen...@gmail.com wrote: Can you ssh between the nodes? -jim On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu wrote: Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was accessed at port 54310. The new cluster which I ve setup has Red Hat Linux release 7.2 (Enigma)running on it. Now when I try to access the dfs from one of the slaves i get the following response: dfs cannot be accessed. When I access the DFS throught the master there s no problem. So I feel there a problem with the port. Any ideas? I did check the list of slaves, it looks fine to me. Mithila On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky jim.twen...@gmail.com wrote: Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be running at all. Generally, one map job is assigned to each input split and if you are running your cluster with the defaults, the splits are 64 MB each. I got confused when you said the Namenode seemed to do all the work. Can you check conf/slaves and make sure you put the names of all task trackers there? I also suggest comparing both clusters with a larger input size, say at least 5 GB, to really see a difference. Jim On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball aa...@cloudera.com wrote: in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data.. Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra mnage...@asu.edu wrote: Aaron That could be the issue, my data is just 516MB - wouldn't this see a bit of speed up? Could you guide me to the example? I ll run my cluster on it and see what I get. Also for my program I had a java timer running to record the time taken to complete execution. Does Hadoop have an inbuilt timer? Mithila On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball aa...@cloudera.com wrote: Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra mnage...@asu.edu wrote: Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized). Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University
bzip2 input format
Does anyone have a input formatter for bzip2? -- Best Regards, Edward J. Yoon edwardy...@apache.org http://blog.udanax.org
Re: Modeling WordCount in a different way
Pankil Doshi wrote: Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Normally you would write the job output to a file and input that to the next job. Any reason why you want to store the map reduce output in memory ? If you can describe your problem, perhaps it could be solved in more mapreduce-ish way. - Sharad
Re: java compile time warning while using MultipleOutputs
warning: [unchecked] unchecked call to collect(K,V) as a member of the raw type org.apache.hadoop.mapred.OutputCollector Yes, I can live with this warning, but it really makes me uneasy. Any suggestions to remove this warning? You can suppress the warning using annotation in your code: @SuppressWarnings({unchecked})
Re: Reduce task attempt retry strategy
Usually, a task is killed when 1. User explicitly kills the task himself 2. Framework kills the task because it did not progress enough 3. Tasks that were speculatively executed Hence the reason for killing has, more often than not, nothing to do with the health of the node where it was running, but rather with the task (user code) itself. It is very difficult to distinguish the case where progress was not reported because the user code was faulty to the case where progress was not reported because the node was slow. Jothi On 4/13/09 10:47 PM, Stefan Will stefan.w...@gmx.net wrote: Jothi, thanks for the explanation. One question though: why shouldn't timed out tasks be retried on a different machine ? As you pointed out, it could very well have been due to the machine having problems. To me a timeout is just like any other kind of failure. -- Stefan From: Jothi Padmanabhan joth...@yahoo-inc.com Reply-To: core-user@hadoop.apache.org Date: Mon, 13 Apr 2009 19:00:38 +0530 To: core-user@hadoop.apache.org Subject: Re: Reduce task attempt retry strategy Currently, only failed tasks are attempted on a node other than the one where it failed. For killed tasks, there is no such policy for retries. failed to report status usually indicates that the task did not report sufficient progress. However, it is possible that the task itself was not progressing fast enough because the machine where it ran had problems. On 4/8/09 12:33 AM, Stefan Will stefan.w...@gmx.net wrote: My cluster has 27 nodes with a total reduce task capacity of 54. The job had 31 reducers. I actually had a task today that showed the behavior you're describing: 3 tries on one machine, and then the 4th on a different one. As for the particular job I was talking about before: Here are the stats for the job: KindTotal Tasks(successful+failed+killed)Successful tasksFailed tasksKilled tasksStart TimeFinish Time Setup 1 1 0 0 4-Apr-2009 00:30:16 4-Apr-2009 00:30:33 (17sec) Map 64 49 12 3 4-Apr-2009 00:30:33 4-Apr-2009 01:11:15 (40mins, 41sec) Reduce 34 30 4 0 4-Apr-2009 00:30:44 4-Apr-2009 04:31:36 (4hrs, 52sec) Cleanup 4 0 4 0 4-Apr-2009 04:31:36 4-Apr-2009 06:32:00 (2hrs, 24sec) Not sure what to look for in the jobtracker log. All it shows for that particular failed task is that it assigned it to the same machine 4 times and then eventually failed. Perhaps something to note is that the 4 failures were all due to timeouts: Task attempt_200904031942_0002_r_13_3 failed to report status for 1802 seconds. Killing! Also, looking at the logs, there was a map task too that was retried on that particuar box 4 times without going to a different one. Perhaps it had something to do with the way this machine failed: The jobtracker still considered it live, while all actual tasks assigned to it timed out. -- Stefan From: Amar Kamat ama...@yahoo-inc.com Reply-To: core-user@hadoop.apache.org Date: Tue, 07 Apr 2009 10:05:16 +0530 To: core-user@hadoop.apache.org Subject: Re: Reduce task attempt retry strategy Stefan Will wrote: Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. What is your cluster size? If a task fails on a machine then its re-tried on some other machine (based on number of good machines left in the cluster). After certain number of failures, the machine will be blacklisted (again based on number of machine left in the cluster). 3 different reducers might be scheduled on that machine but that should not lead to job failure. Can you explain in detail what exactly happened. Find out where the attempts got scheduled from the jobtracker's log. Amar Perhaps I¹m confusing this with the block placement strategy in hdfs, but I always thought that the framework would retry jobs on a different machine if retries on the original machine keep failing. E.g. I would have expected to retry once or twice on the same machine, but then switch to a different one to minimize the likelihood of getting stuck on a bad machine. What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for improving on this in the future ? Thanks, Stefan
Is there anyone use hadoop for other applications rather than MapReduce?
Hi, all I am wondering whether the hdfs is suitable for other applications , which may be more general purpose application or simply a huge amount of storage? Any feedback for that? Thanks. Lei
Re: Is there anyone use hadoop for other applications rather than MapReduce?
HI Lei, Is there any particular problem on your hand? Thanks On Mon, Apr 13, 2009 at 9:01 PM, Lei Xu xule...@gmail.com wrote: Hi, all I am wondering whether the hdfs is suitable for other applications , which may be more general purpose application or simply a huge amount of storage? Any feedback for that? Thanks. Lei -- Cheers! Hadoop core
Re: Interesting Hadoop/FUSE-DFS access patterns
The following very simple program will tell the VM to drop the pages being cached for a file. I tend to spin this in a for loop when making large tar files, or otherwise working with large files, and the system performance really smooths out. Since it use open(path) it will churn through the inode cache and directories. Something like this might actually significantly speed up HDFS by running over the blocks on the datanodes, for busy clusters. #define _XOPEN_SOURCE 600 #define _GNU_SOURCE #include stdio.h #include stdlib.h #include string.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h /** Simple program to dump buffered data for specific files from the buffer cache. Copyright Jason Venner 2009, License GPL*/ int main( int argc, char** argv ) { int failCount = 0; int i; for( i = 1; i argc; i++ ) { char* file = argv[i]; int fd = open( file, O_RDONLY|O_LARGEFILE ); if (fd == -1) { perror( file ); failCount++; continue; } if (posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED )!=0) { fprintf( stderr, Failed to flush cache for %s %s\n, argv[optind], strerror( posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED ) ) ); failCount++; } close(fd); } exit( failCount ); } On Mon, Apr 13, 2009 at 4:01 PM, Scott Carey sc...@richrelevance.comwrote: On 4/12/09 9:41 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Ok, here's something perhaps even more strange. I removed the seek part out of my timings, so I was only timing the read instead of the seek + read as in the first case. I also turned the read-ahead down to 1-byte (aka, off). The jump *always* occurs at 128KB, exactly. Some random ideas: I have no idea how FUSE interops with the Linux block layer, but 128K happens to be the default 'readahead' value for block devices, which may just be a coincidence. For a disk 'sda', you check and set the value (in 512 byte blocks) with: /sbin/blockdev --getra /dev/sda /sbin/blockdev --setra [num blocks] /dev/sda I know on my file system tests, the OS readahead is not activated until a series of sequential reads go through the block device, so truly random access is not affected by this. I've set it to 128MB and random iops does not change on a ext3 or xfs file system. If this applies to FUSE too, there may be reasons that this behavior differs. Furthermore, one would not expect it to be slower to randomly read 4k than randomly read up to the readahead size itself even if it did. I also have no idea how much of the OS device queue and block device scheduler is involved with FUSE. If those are involved, then there's a bunch of stuff to tinker with there as well. Lastly, an FYI if you don't already know the following. If the OS is caching pages, there is a way to flush these in Linux to evict the cache. See /proc/sys/vm/drop_caches . I'm a bit befuddled. I know we say that HDFS is optimized for large, sequential reads, not random reads - but it seems that it's one bug- fix away from being a good general-purpose system. Heck if I can find what's causing the issues though... Brian -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: Hadoop and Image analysis question
If you pack your images into sequence files, as the value items, the cluster will automatically do a decent job of ensuring that the input splits made from the sequences files are local to the map task. We did this in production at a previous job and it worked very well for us. Might as well turn off sequence file compression unless you are passing raw images, or have substantial amounts of compressible meta data. Do remember to drop the images from the output records passed to the reduce phase if you have to have one, or the reduce will be expensive. On Sun, Apr 12, 2009 at 11:13 PM, Sharad Agarwal shara...@yahoo-inc.comwrote: Sameer Tilak wrote: Hi everyone, I would like to use Hadoop for analyzing tens of thousands of images. Ideally each mapper gets few hundred images to process and I'll have few hundred mappers. However, I want the mapper function to run on the machine where its images are stored. How can I achieve that. With text data creating splits and exploiting locality seems easy. You can store the image files in hdfs. However storing too many small files in hdfs will result in scalability and performance issues. So you can combine multiple image files into a sequence file. There are some other approaches also discussed here: http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/ -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: Extending ClusterMapReduceTestCase
I have a nice variant of this in the ch7 examples section of my book, including a standalone wrapper around the virtual cluster for allowing multiple test instances to share the virtual cluster - and allow an easier time to poke around with the input and output datasets. It even works decently under windows - my editor insisting on word to recent for crossover. On Mon, Apr 13, 2009 at 9:16 AM, czero brian.sta...@gmail.com wrote: Sry, I forgot to include the not-IntelliJ-console output :) 09/04/13 12:07:14 ERROR mapred.MiniMRCluster: Job tracker crashed java.lang.NullPointerException at java.io.File.init(File.java:222) at org.apache.hadoop.mapred.JobHistory.init(JobHistory.java:143) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1110) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:143) at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:96) at java.lang.Thread.run(Thread.java:637) I managed to pick up the chapter in the Hadoop Book that Jason mentions that deals with Unit testing (great chapter btw) and it looks like everything is in order. He points out that this error is typically caused by a bad hadoop.log.dir or missing log4j.properties, but I verified that my dir is ok and my hadoop-0.19.1-core.jar has the log4j.properties in it. I also tried running the same test with hadoop-core/test 0.19.0 - same thing. Thanks again, bc czero wrote: Hey all, I'm also extending the ClusterMapReduceTestCase and having a bit of trouble as well. Currently I'm getting : Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2 Starting DataNode 1 with dfs.data.dir: build/test/data/dfs/data/data3,build/test/data/dfs/data/data4 Generating rack names for tasktrackers Generating host names for tasktrackers And then nothing... just spins on that forever. Any ideas? I have all the jetty and jetty-ext libs in the classpath and I set the hadoop.log.dir and the SAX parser correctly. This is all I have for my test class so far, I'm not even doing anything yet: public class TestDoop extends ClusterMapReduceTestCase { @Test public void testDoop() throws Exception { System.setProperty(hadoop.log.dir, ~/test-logs); System.setProperty(javax.xml.parsers.SAXParserFactory, com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl); setUp(); System.out.println(done.); } Thanks! bc -- View this message in context: http://www.nabble.com/Extending-ClusterMapReduceTestCase-tp22440254p23024597.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422