Re: Namenode failed to start with "FSNamesystem initialization failed" error
Hi Raghu, The thread you posted is my original post written when this problem first happened on my cluster. I can file a JIRA but I wouldn't be able to provide information other than what I already posted and I don't have the logs from that time. Should I still file ? Thanks, Tamir On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi wrote: > Tamir, > > Please file a jira on the problem you are seeing with 'saveLeases'. In the > past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724, > and more mentioned in HADOOP-3724). > > Also refer the thread you started > http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html > > I think another user reported the same problem recently. > > These are indeed very serious and very annoying bugs. > > Raghu. > > > Tamir Kamara wrote: > >> I didn't have a space problem which led to it (I think). The corruption >> started after I bounced the cluster. >> At the time, I tried to investigate what led to the corruption but didn't >> find anything useful in the logs besides this line: >> saveLeases found path >> >> /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 >> but no matching entry in namespace >> >> I also tried to recover from the secondary name node files but the >> corruption my too wide-spread and I had to format. >> >> Tamir >> >> On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: >> >> Hi. >>> >>> Same conditions - where the space has run out and the fs got corrupted? >>> >>> Or it got corrupted by itself (which is even more worrying)? >>> >>> Regards. >>> >>> 2009/5/4 Tamir Kamara >>> >>> I had the same problem a couple of weeks ago with 0.19.1. Had to reformat the cluster too... On Mon, May 4, 2009 at 3:50 PM, Stas Oskin wrote: Hi. > > After rebooting the NameNode server, I found out the NameNode doesn't > start > anymore. > > The logs contained this error: > "FSNamesystem initialization failed" > > > I suspected filesystem corruption, so I tried to recover from > SecondaryNameNode. Problem is, it was completely empty! > > I had an issue that might have caused this - the root mount has run out > of > space. But, both the NameNode and the SecondaryNameNode directories > were >>> on > another mount point with plenty of space there - so it's very strange > that > they were impacted in any way. > > Perhaps the logs, which were located on root mount and as a result, > could >>> not be written, have caused this? > > > To get back HDFS running, i had to format the HDFS (including manually > erasing the files from DataNodes). While this reasonable in test > environment > - production-wise it would be very bad. > > Any idea why it happened, and what can be done to prevent it in the > future? > I'm using the stable 0.18.3 version of Hadoop. > > Thanks in advance! > > >> >
PIG and Hive
Are they competing technologies of providing a higher level language for Map/Reduce programming ? Or are they complementary ? Any comparison between them ? Rgds, Ricky
Import new content to Hbase?
Hi all! I have a MR jobs to import contents to HBase. Before importing, I have to determine the new contents to import (The row key in Hbase is URI). After import this new contents to HBase. Assume, I have large content in HBabse (> 1,000,000,000 URIs) and I have 1,000,000 URIs need to import (new + existed in Hbase). How to get new contents (URIs) to import? The current solution: I check the existed of the URI in Hbase to get the new URIs. Some things like: RowResult row = hTable.getRow(uri); if (row.isEmpty()) { // collect the new content (URI) } With this solution, if URIs is large then the time connection to HBase is large :( Please suggest for me the good solution. :) Thanks! Best regards, Nguyen.
Re: What User Accounts Do People Use For Team Dev?
Best is to use one user for map/reduce and another for hdfs. Neither of them should be root or "real" users. With the setuid patch (HADOOP-4490), it is possible to run the jobs as the submitted user. Note that if you do that, you no doubt want to block certain system uids (bin, mysql, etc.) -- Owen
Free Training at 2009 Hadoop Summit
Just wanted to follow up on this and let everyone know that Cloudera and Y! are teaming up to offer two day-long training sessions for free on the day after the summit (June 11th). We'll cover Hadoop basics, Pig, Hive and some new tools Cloudera is releasing for importing data to Hadoop from existing databases. http://hadoopsummit09-training.eventbrite.com Each of these sessions normally runs about $1000 but were taking advantage of having so much of the Hadoop community in one place and offering this for free at the 2009 Hadoop Summit. Basic training is appropriate for people just getting started with Hadoop, and the advanced training will focus on augmenting your existing infrastructure with Hadoop and taking advantage of Hadoop's advanced features and related projects. Space is limited, so sign up before time runs out. Hope to see you there! Christophe and the Cloduera Team On Wed, May 6, 2009 at 6:10 AM, Ajay Anand wrote: > This year’s Hadoop Summit > (http://developer.yahoo.com/events/hadoopsummit09/) is confirmed for June > 10th at the Santa Clara Marriott, and is now open for registration. > > > > We have a packed agenda, with three tracks – for developers, administrators, > and one focused on new and innovative applications using Hadoop. The > presentations include talks from Amazon, IBM, Sun, Cloudera, Facebook, HP, > Microsoft, and the Yahoo! team, as well as leading universities including UC > Berkeley, CMU, Cornell, U of Maryland, U of Nebraska and SUNY. > > > > From our experience last year with the rush for seats, I would encourage > people to register early at http://hadoopsummit09.eventbrite.com/ > > > > Looking forward to seeing you at the summit! > > > > Ajay -- get hadoop: cloudera.com/hadoop online training: cloudera.com/hadoop-training blog: cloudera.com/blog twitter: twitter.com/cloudera
Re: specific block size for a file
Trade off between hdfs efficiency and data locality. On Tue, May 5, 2009 at 9:37 AM, Arun C Murthy wrote: > > On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote: > > Hi all, >> >> I have a job that creates very big local files so i need to split it to as >> many mappers as possible. Now the DFS block size I'm >> using means that this job is only split to 3 mappers. I don't want to >> change the hdfs wide block size because it works for my other jobs. >> >> > I would rather keep the big files on HDFS and use -Dmapred.min.split.size > to get more maps to process your data > > http://hadoop.apache.org/core/docs/r0.20.0/mapred_tutorial.html#Job+Input > > Arun > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: multi-line records and file splits
I think your SDFInputFormat should implement the MultiFileInputFormat instead of the TextInputFormat, which will not splid the file into chunk. 2009/5/6 Rajarshi Guha > Hi, I have implemented a subclass of RecordReader to handle a plain text > file format where a record is multi-line and of variable length. > Schematically each record is of the form > > some_title > foo > bar > > another_title > foo > foo > bar > > > where is the marker for the end of the record. My code is at > http://blog.rguha.net/?p=293 and it seems to work fine on my input data. > > However, I realized that when I run the program, Hadoop will 'chunk' the > input file. As a result, the SDFRecordReader might get a chunk of input > text, such that the last record is actually incomplete (a missing ). Is > this correct? > > If so, how would the RecordReader implementation recover from this > situation? Or is there a way to indicate to Hadoop that the input file > should be chunked keeping in mind end of record delimiters? > > Thanks > > --- > Rajarshi Guha > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > --- > Q: What's polite and works for the phone company? > A: A deferential operator. > > > -- http://daily.appspot.com/food/
move tasks to another machine on the fly
I get this error when running Reduce tasks on a machine: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2591) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:387) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:117) at org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44) at org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410) is it possible to move a reduce task to other machine in the cluster on the fly? -- ./david
large files vs many files
hi there, working through a concept at the moment and was attempting to write lots of data to few files as opposed to writing lots of data to lots of little files. what are the thoughts on this? When I try and implement outputStream = hdfs.append(path); there doesn't seem to be any locking mechanism in place ... or there is and it doesn't work well enough for many writes per second? i have read and seen that the property "dfs.support.append" is not meant for production use. still, if millions of little files are as good or better --- or no difference -- to a few massive files then i suppose append isn't something i really need. I do see a lot of stack traces with messages like: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /foo/bar/aaa.bbb.ccc.ddd.xxx for DFSClient_-1821265528 on client 127.0.0.1 because current leaseholder is trying to recreate file. i hope this make sense. still a little bit confused. thanks in advance -sd -- Sasha Dolgy sasha.do...@gmail.com
Re: NoSuchMethodException when running Map Task
Hey, I assume you've already figured it out, but for people like me, stumbling through the same error, it appears to be caused by forgetting to make your inner classes (the ones implementing the Mapper and Reducer interfaces), static. -stu Dan Benjamin wrote: > > Sorry, I should have mentioned I'm using hadoop version 0.18.1 and java > 1.6. > > > Dan Benjamin wrote: >> >> I've got a simple hadoop job running on an EC2 cluster using the scripts >> under src/contrib/ec2. The map tasks all fail with the following error: >> >> 08/10/07 15:11:00 INFO mapred.JobClient: Task Id : >> attempt_200810071501_0001_m_31_0, Status : FAILED >> java.lang.RuntimeException: java.lang.NoSuchMethodException: >> ManifestRetriever$Map.() >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:80) >> at >> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33) >> at >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:223) >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) >> Caused by: java.lang.NoSuchMethodException: >> com.amazon.ec2.ebs.billing.ManifestRetriever$Map.() >> at java.lang.Class.getConstructor0(Class.java:2706) >> at java.lang.Class.getDeclaredConstructor(Class.java:1985) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:74) >> >> I tried adding an explicit (public) no-arg constructor to the >> ManifestRetriever.Map class but this gives me the same error. Has anyone >> encountered this problem before? >> >> > > -- View this message in context: http://www.nabble.com/NoSuchMethodException-when-running-Map-Task-tp19865280p23396833.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
multi-line records and file splits
Hi, I have implemented a subclass of RecordReader to handle a plain text file format where a record is multi-line and of variable length. Schematically each record is of the form some_title foo bar another_title foo foo bar where is the marker for the end of the record. My code is at http://blog.rguha.net/?p=293 and it seems to work fine on my input data. However, I realized that when I run the program, Hadoop will 'chunk' the input file. As a result, the SDFRecordReader might get a chunk of input text, such that the last record is actually incomplete (a missing ). Is this correct? If so, how would the RecordReader implementation recover from this situation? Or is there a way to indicate to Hadoop that the input file should be chunked keeping in mind end of record delimiters? Thanks --- Rajarshi Guha GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 --- Q: What's polite and works for the phone company? A: A deferential operator.
Hadoop Summit 2009 - Open for registration
This year's Hadoop Summit (http://developer.yahoo.com/events/hadoopsummit09/) is confirmed for June 10th at the Santa Clara Marriott, and is now open for registration. We have a packed agenda, with three tracks - for developers, administrators, and one focused on new and innovative applications using Hadoop. The presentations include talks from Amazon, IBM, Sun, Cloudera, Facebook, HP, Microsoft, and the Yahoo! team, as well as leading universities including UC Berkeley, CMU, Cornell, U of Maryland, U of Nebraska and SUNY. >From our experience last year with the rush for seats, I would encourage people to register early at http://hadoopsummit09.eventbrite.com/ Looking forward to seeing you at the summit! Ajay
Re: Namenode failed to start with "FSNamesystem initialization failed" error
the image is stored in two files : fsimage and edits (under namenode-directory/current/). Stas Oskin wrote: Well, it definitely caused the SecondaryNameNode to crash, and also seems to have triggered some strange issues today as well. By the way, how the image file is named?
Jobtracker showing jobs which seem dead
Hi all, I've got an installation of Hadoop up working with a Nutch crawler, and it looks like recently the jobs are all halting in the middle of the reduce phase. This is on Hadoop 0.19.1 Here's what I'm seeing in the datanode logs: (there were a few in the logs, but the last error was almost a day ago) 2009-05-04 17:02:24,889 ERROR datanode.DataNode - DatanodeRegistration(10.9.17.206:50010, storageID=DS-1024739802-10.9.17.206-50010-1238445482034, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.9.17.206:50010 remote=/10.9.17.206:50537] at org .apache .hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org .apache .hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java: 159) at org .apache .hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java: 198) at org .apache .hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java: 313) at org .apache .hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org .apache .hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org .apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) I searched for the error message and it turned up a few potential bugs with HBase, but I don't think that's in play here as I can't find any mention of it in the configuration files for our setup. Or, if it's possible I need to change hbase configurations, would that involve creating an hbase-site.xml config file in the hadoop conf directory or does that go directly in hadoop-site.xml? Otherwise, I can't seem to track down what might be causing this. All of the status information about the job that I can find seems to report it's fine and normal, but it hasn't progressed in almost a day's time now. (Should be a 3-5 hour job if all goes well, and it used to...) Ideas? Can I provide more info? Thanks, Mark
DFS # of blocks
howdy all, im doing some hadoop testing (so im still new to it), and im running into an error. ( DataStreamer Exception: java.io.IOException: Unable to create new block.) My DFS is not large (971 files and directories, 3906 blocks = 4877 total. Heap Size is 13.9 MB / 966.69 MB (1%) ), and definitely not full ;), and hadoop is writing to a separate file for each reducer output key (roughly 4K keys), but after like 900 it just dies. I am using a child of MultipleTextOutputFormat for my output format that creates a file for the key and puts the list of values in it in the same dfs folder. Below is the last 20 lines of logs for the node that it failed on. Any thoughts? eTask: Read 122146089 bytes from map-output for attempt_200905051459_0001_m_00_0 2009-05-05 15:09:59,672 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200905051459_0001_m_00_0 -> (19, 2771) from hadoop2 2009-05-05 15:10:00,652 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2009-05-05 15:10:00,652 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2009-05-05 15:10:00,652 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2009-05-05 15:10:00,652 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left. 2009-05-05 15:10:00,652 INFO org.apache.hadoop.mapred.ReduceTask: In- memory merge complete: 2 files left. 2009-05-05 15:10:00,780 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments 2009-05-05 15:10:00,780 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 245292507 bytes 2009-05-05 15:10:06,276 INFO org.apache.hadoop.mapred.ReduceTask: Merged 2 segments, 245292507 bytes to disk to satisfy reduce memory limit 2009-05-05 15:10:06,280 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 245292509 bytes from disk 2009-05-05 15:10:06,284 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2009-05-05 15:10:06,284 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2009-05-05 15:10:06,312 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 245292505 bytes 2009-05-05 15:10:35,346 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-05-05 15:10:35,374 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2874778794594289753_8565 2009-05-05 15:10:41,402 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-05-05 15:10:41,402 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_243746846946054460_8565 2009-05-05 15:10:47,411 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-05-05 15:10:47,411 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-4508716893999737242_8565 2009-05-05 15:10:53,419 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-05-05 15:10:53,419 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2679897353336358687_8565 2009-05-05 15:10:59,423 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2781) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access $2000(DFSClient.java:2046) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:2232) 2009-05-05 15:10:59,423 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2679897353336358687_8565 bad datanode[0] nodes == null 2009-05-05 15:10:59,423 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/testing/output/medium_output/ _temporary/_attempt_200905051459_0001_r_00_3/sometestingkey" - Aborting... 2009-05-05 15:12:24,960 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.EOFException at java.io.DataInputStream.readByte(Unknown Source) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient $DFSOutputStream.createBlockOutputStream(DFSClient.java:2837) at org.apache.hadoop.hdfs.DFSClient $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2762) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access $2000(DFSClient.java:2046) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:2232) 2009-05-05 15:12:24,972 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Hi. 2009/5/5 Raghu Angadi > Stas Oskin wrote: > >> Actually, we discovered today an annoying bug in our test-app, which might >> have moved some of the HDFS files to the cluster, including the metadata >> files. >> > > oops! presumably it could have removed the image file itself. > > I presume it could be the possible reason for such behavior? :) >> > > certainly. It could lead to many different failures. If you had stack trace > of the exception, it would be more clear what the error was this time. > > Raghu. > Well, it definitely caused the SecondaryNameNode to crash, and also seems to have triggered some strange issues today as well. By the way, how the image file is named? Regards.
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Stas Oskin wrote: Actually, we discovered today an annoying bug in our test-app, which might have moved some of the HDFS files to the cluster, including the metadata files. oops! presumably it could have removed the image file itself. I presume it could be the possible reason for such behavior? :) certainly. It could lead to many different failures. If you had stack trace of the exception, it would be more clear what the error was this time. Raghu. 2009/5/5 Stas Oskin Hi Raghu. The only lead I have, is that my root mount has filled-up completely. This in itself should not have caused the metadata corruption, as it has been stored on another mount point, which had plenty of space. But perhaps the fact that NameNode/SecNameNode didn't have enough space for logs has caused this? Unfortunately I was pressed in time to get the cluster up and running, and didn't preserve the logs or the image. If this happens again - I will surely do so. Regards. 2009/5/5 Raghu Angadi Stas, This is indeed a serious issue. Did you happen to store the the corrupt image? Can this be reproduced using the image? Usually you can recover manually from a corrupt or truncated image. But more importantly we want to find how it got in to this state. Raghu. Stas Oskin wrote: Hi. This quite worry-some issue. Can anyone advice on this? I'm really concerned it could appear in production, and cause a huge data loss. Is there any way to recover from this? Regards. 2009/5/5 Tamir Kamara I didn't have a space problem which led to it (I think). The corruption started after I bounced the cluster. At the time, I tried to investigate what led to the corruption but didn't find anything useful in the logs besides this line: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace I also tried to recover from the secondary name node files but the corruption my too wide-spread and I had to format. Tamir On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: Hi. Same conditions - where the space has run out and the fs got corrupted? Or it got corrupted by itself (which is even more worrying)? Regards. 2009/5/4 Tamir Kamara I had the same problem a couple of weeks ago with 0.19.1. Had to reformat the cluster too... On Mon, May 4, 2009 at 3:50 PM, Stas Oskin wrote: Hi. After rebooting the NameNode server, I found out the NameNode doesn't start anymore. The logs contained this error: "FSNamesystem initialization failed" I suspected filesystem corruption, so I tried to recover from SecondaryNameNode. Problem is, it was completely empty! I had an issue that might have caused this - the root mount has run out of space. But, both the NameNode and the SecondaryNameNode directories were on another mount point with plenty of space there - so it's very strange that they were impacted in any way. Perhaps the logs, which were located on root mount and as a result, could not be written, have caused this? To get back HDFS running, i had to format the HDFS (including manually erasing the files from DataNodes). While this reasonable in test environment - production-wise it would be very bad. Any idea why it happened, and what can be done to prevent it in the future? I'm using the stable 0.18.3 version of Hadoop. Thanks in advance!
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Actually, we discovered today an annoying bug in our test-app, which might have moved some of the HDFS files to the cluster, including the metadata files. I presume it could be the possible reason for such behavior? :) 2009/5/5 Stas Oskin > Hi Raghu. > > The only lead I have, is that my root mount has filled-up completely. > > This in itself should not have caused the metadata corruption, as it has > been stored on another mount point, which had plenty of space. > > But perhaps the fact that NameNode/SecNameNode didn't have enough space for > logs has caused this? > > Unfortunately I was pressed in time to get the cluster up and running, and > didn't preserve the logs or the image. > If this happens again - I will surely do so. > > Regards. > > 2009/5/5 Raghu Angadi > > >> Stas, >> >> This is indeed a serious issue. >> >> Did you happen to store the the corrupt image? Can this be reproduced >> using the image? >> >> Usually you can recover manually from a corrupt or truncated image. But >> more importantly we want to find how it got in to this state. >> >> Raghu. >> >> >> Stas Oskin wrote: >> >>> Hi. >>> >>> This quite worry-some issue. >>> >>> Can anyone advice on this? I'm really concerned it could appear in >>> production, and cause a huge data loss. >>> >>> Is there any way to recover from this? >>> >>> Regards. >>> >>> 2009/5/5 Tamir Kamara >>> >>> I didn't have a space problem which led to it (I think). The corruption started after I bounced the cluster. At the time, I tried to investigate what led to the corruption but didn't find anything useful in the logs besides this line: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace I also tried to recover from the secondary name node files but the corruption my too wide-spread and I had to format. Tamir On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: Hi. > > Same conditions - where the space has run out and the fs got corrupted? > > Or it got corrupted by itself (which is even more worrying)? > > Regards. > > 2009/5/4 Tamir Kamara > > I had the same problem a couple of weeks ago with 0.19.1. Had to >> > reformat > the cluster too... >> >> On Mon, May 4, 2009 at 3:50 PM, Stas Oskin >> > wrote: > Hi. >>> >>> After rebooting the NameNode server, I found out the NameNode doesn't >>> >> start >> >>> anymore. >>> >>> The logs contained this error: >>> "FSNamesystem initialization failed" >>> >>> >>> I suspected filesystem corruption, so I tried to recover from >>> SecondaryNameNode. Problem is, it was completely empty! >>> >>> I had an issue that might have caused this - the root mount has run >>> >> out > of >> >>> space. But, both the NameNode and the SecondaryNameNode directories >>> >> were > >> on >> >>> another mount point with plenty of space there - so it's very strange >>> >> that >> >>> they were impacted in any way. >>> >>> Perhaps the logs, which were located on root mount and as a result, >>> >> could > >> not be written, have caused this? >>> >>> >>> To get back HDFS running, i had to format the HDFS (including >>> >> manually > erasing the files from DataNodes). While this reasonable in test >>> environment >>> - production-wise it would be very bad. >>> >>> Any idea why it happened, and what can be done to prevent it in the >>> >> future? >> >>> I'm using the stable 0.18.3 version of Hadoop. >>> >>> Thanks in advance! >>> >>> >>> >>
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Hi Raghu. The only lead I have, is that my root mount has filled-up completely. This in itself should not have caused the metadata corruption, as it has been stored on another mount point, which had plenty of space. But perhaps the fact that NameNode/SecNameNode didn't have enough space for logs has caused this? Unfortunately I was pressed in time to get the cluster up and running, and didn't preserve the logs or the image. If this happens again - I will surely do so. Regards. 2009/5/5 Raghu Angadi > > Stas, > > This is indeed a serious issue. > > Did you happen to store the the corrupt image? Can this be reproduced using > the image? > > Usually you can recover manually from a corrupt or truncated image. But > more importantly we want to find how it got in to this state. > > Raghu. > > > Stas Oskin wrote: > >> Hi. >> >> This quite worry-some issue. >> >> Can anyone advice on this? I'm really concerned it could appear in >> production, and cause a huge data loss. >> >> Is there any way to recover from this? >> >> Regards. >> >> 2009/5/5 Tamir Kamara >> >> I didn't have a space problem which led to it (I think). The corruption >>> started after I bounced the cluster. >>> At the time, I tried to investigate what led to the corruption but didn't >>> find anything useful in the logs besides this line: >>> saveLeases found path >>> >>> >>> /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 >>> but no matching entry in namespace >>> >>> I also tried to recover from the secondary name node files but the >>> corruption my too wide-spread and I had to format. >>> >>> Tamir >>> >>> On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: >>> >>> Hi. Same conditions - where the space has run out and the fs got corrupted? Or it got corrupted by itself (which is even more worrying)? Regards. 2009/5/4 Tamir Kamara I had the same problem a couple of weeks ago with 0.19.1. Had to > reformat >>> the cluster too... > > On Mon, May 4, 2009 at 3:50 PM, Stas Oskin > wrote: >>> Hi. >> >> After rebooting the NameNode server, I found out the NameNode doesn't >> > start > >> anymore. >> >> The logs contained this error: >> "FSNamesystem initialization failed" >> >> >> I suspected filesystem corruption, so I tried to recover from >> SecondaryNameNode. Problem is, it was completely empty! >> >> I had an issue that might have caused this - the root mount has run >> > out >>> of > >> space. But, both the NameNode and the SecondaryNameNode directories >> > were > on > >> another mount point with plenty of space there - so it's very strange >> > that > >> they were impacted in any way. >> >> Perhaps the logs, which were located on root mount and as a result, >> > could > not be written, have caused this? >> >> >> To get back HDFS running, i had to format the HDFS (including >> > manually >>> erasing the files from DataNodes). While this reasonable in test >> environment >> - production-wise it would be very bad. >> >> Any idea why it happened, and what can be done to prevent it in the >> > future? > >> I'm using the stable 0.18.3 version of Hadoop. >> >> Thanks in advance! >> >> >> >
Re: java.io.EOFException: while trying to read 65557 bytes
This can happen for example when a client is killed when it has some files open for write. In that case it is an expected error (the log should really be at WARN or INFO level). Raghu. Albert Sunwoo wrote: Hello Everyone, I know there's been some chatter about this before but I am seeing the errors below on just about every one of our nodes. Is there a definitive reason on why these are occuring, is there something that we can do to prevent these? 2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.102.0.105:50010, storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 65557 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) Followed by: 2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_-7056150840276493498_10885 1 Exception java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.Socke tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 millis timeout left. at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853) at java.lang.Thread.run(Thread.java:619) Thanks, Albert
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Tamir, Please file a jira on the problem you are seeing with 'saveLeases'. In the past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724, and more mentioned in HADOOP-3724). Also refer the thread you started http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html I think another user reported the same problem recently. These are indeed very serious and very annoying bugs. Raghu. Tamir Kamara wrote: I didn't have a space problem which led to it (I think). The corruption started after I bounced the cluster. At the time, I tried to investigate what led to the corruption but didn't find anything useful in the logs besides this line: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace I also tried to recover from the secondary name node files but the corruption my too wide-spread and I had to format. Tamir On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: Hi. Same conditions - where the space has run out and the fs got corrupted? Or it got corrupted by itself (which is even more worrying)? Regards. 2009/5/4 Tamir Kamara I had the same problem a couple of weeks ago with 0.19.1. Had to reformat the cluster too... On Mon, May 4, 2009 at 3:50 PM, Stas Oskin wrote: Hi. After rebooting the NameNode server, I found out the NameNode doesn't start anymore. The logs contained this error: "FSNamesystem initialization failed" I suspected filesystem corruption, so I tried to recover from SecondaryNameNode. Problem is, it was completely empty! I had an issue that might have caused this - the root mount has run out of space. But, both the NameNode and the SecondaryNameNode directories were on another mount point with plenty of space there - so it's very strange that they were impacted in any way. Perhaps the logs, which were located on root mount and as a result, could not be written, have caused this? To get back HDFS running, i had to format the HDFS (including manually erasing the files from DataNodes). While this reasonable in test environment - production-wise it would be very bad. Any idea why it happened, and what can be done to prevent it in the future? I'm using the stable 0.18.3 version of Hadoop. Thanks in advance!
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Stas, This is indeed a serious issue. Did you happen to store the the corrupt image? Can this be reproduced using the image? Usually you can recover manually from a corrupt or truncated image. But more importantly we want to find how it got in to this state. Raghu. Stas Oskin wrote: Hi. This quite worry-some issue. Can anyone advice on this? I'm really concerned it could appear in production, and cause a huge data loss. Is there any way to recover from this? Regards. 2009/5/5 Tamir Kamara I didn't have a space problem which led to it (I think). The corruption started after I bounced the cluster. At the time, I tried to investigate what led to the corruption but didn't find anything useful in the logs besides this line: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace I also tried to recover from the secondary name node files but the corruption my too wide-spread and I had to format. Tamir On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: Hi. Same conditions - where the space has run out and the fs got corrupted? Or it got corrupted by itself (which is even more worrying)? Regards. 2009/5/4 Tamir Kamara I had the same problem a couple of weeks ago with 0.19.1. Had to reformat the cluster too... On Mon, May 4, 2009 at 3:50 PM, Stas Oskin wrote: Hi. After rebooting the NameNode server, I found out the NameNode doesn't start anymore. The logs contained this error: "FSNamesystem initialization failed" I suspected filesystem corruption, so I tried to recover from SecondaryNameNode. Problem is, it was completely empty! I had an issue that might have caused this - the root mount has run out of space. But, both the NameNode and the SecondaryNameNode directories were on another mount point with plenty of space there - so it's very strange that they were impacted in any way. Perhaps the logs, which were located on root mount and as a result, could not be written, have caused this? To get back HDFS running, i had to format the HDFS (including manually erasing the files from DataNodes). While this reasonable in test environment - production-wise it would be very bad. Any idea why it happened, and what can be done to prevent it in the future? I'm using the stable 0.18.3 version of Hadoop. Thanks in advance!
Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
'cloud computing' is a hot term. According to the definition provided by wikipedia http://en.wikipedia.org/wiki/Cloud_computing, Hadoop+HBase+Lucene+Zookeeper, fits some of the criteria but not well. Hadoop is scalable, with HOD it is dynamically scalable. I do not think (Hadoop+HBase+Lucene+Zookeeper) can be used for 'utility computing'. as managing the stack and getting started is quite a complex process. Also this stack is best running on LAN network with high speed interlinks. Historically the "Cloud" is composed of WAN links. An implication of Cloud Computing is that different services would be running in different geographical locations which is not how hadoop is normally deployed. I believe 'Apache Grid Stack' would be a more fitting. http://en.wikipedia.org/wiki/Grid_computing Grid computing (or the use of computational grids) is the application of several computers to a single problem at the same time — usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data. Grid computing via the Wikipedia definition describes exactly what hadoop does. Without amazon S3 and EC2 hadoop does not fit well into a 'cloud computing' IMHO
RE: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
The slide deck talks about possible bundling of various existing Apache technologies in distributed systems as well as some Java API to access Amazon cloud services. What hasn't been discussed is the difference between a "traditional distributed architecture" and "the cloud". They are "close" but not close enough to be treated the "same". In my opinion, some of the distributed technology in Apache need to be enhanced in order to fit into the cloud more effectively. Let me focus in some cloud characteristics that our existing Apache distributed technologies hasn't been paying attention to: Extreme elasticity, Trust boundary, and cost awareness. Extreme elasticity === Most distributed technologies treat machine shutdown/startup a relatively infrequent operation and hasn't tried hard to minimize the cost of handling this situations. Look at Hadoop as an example, although it can handle machine crashes gracefully, it doesn't handle cloud bursting scenario well (ie: when a lot of machines is added to Hadoop cluster). You need to run a data redistribution task in the background and slow down your existing job. Another example is that many scripts in Hadoop relies on config file that specify each cluster member's IP address. In a cloud environment, IP address is unstable so we need to have a discovery mechanism and also rework the scripts. Trust boundary === Most distributed technologies are assuming a homogeneous environment (every member has the same degree of trust), which is not the case in the cloud environment. Additional processing (cryptographic operation for data transfer and storage) may be necessary when dealing with machines running in the cloud. Cost awareness === Same reason as they are assuming a homogeneous environment, the scheduler is not aware of the involved cost when they move data across the cloud boundary (especially bandwidth cost is relatively high). The Hadoop MapReduce scheduler need to be more sophisticated when scheduling where to start the Mapper and Reducer. Similarly, when making the replica placement decision, HDFS needs to be aware of which machine is located in which cloud. That said, I am not discounting the existing Apache technology. In fact, we have already made a good step. We just need to go further. Rgds, Ricky -Original Message- From: Bradford Stephens [mailto:bradfordsteph...@gmail.com] Sent: Tuesday, May 05, 2009 9:53 AM To: core-user@hadoop.apache.org Subject: Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc I read through the deck and sent it around the company. Good stuff! It's going to be a big help for trying to get the .NET Enterprise people wrapping their heads around web-scale data. I must admit "Apache Cloud Computing Edition" is sort of unwieldy to say verbally, and frankly "Java Enterprise Edition" is a taboo phrase at a lot of projects I've had. Guilt by association. I think I'll call it "Apache Cloud Stack", and reference "Apache Cloud Computing Edition" in my deck. When I think "Stack", I think of a suite of software that provides all the pieces I need to solve my problem :) On Tue, May 5, 2009 at 7:00 AM, Steve Loughran wrote: > Bradford Stephens wrote: >> >> Hey all, >> >> I'm going to be speaking at OSCON about my company's experiences with >> Hadoop and Friends, but I'm having a hard time coming up with a name >> for the entire software ecosystem. I'm thinking of calling it the >> "Apache CloudStack". Does this sound legit to you all? :) Is there >> something more 'official'? > > We've been using "Apache Cloud Computing Edition" for this, to emphasise > this is the successor to Java Enterprise Edition, and that it is cross > language and being built at apache. If you use the same term, even if you > put a different stack outline than us, it gives the idea more legitimacy. > > The slides that Andrew linked to are all in SVN under > http://svn.apache.org/repos/asf/labs/clouds/ > > we have a space in the apache labs for "apache clouds", where we want to do > more work integrating things, and bringing the idea of deploy and test on > someone else's infrastructure mainstream across all the apache products. We > would welcome your involvement -and if you send a draft of your slides out, > will happily review them > > -steve >
Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
I read through the deck and sent it around the company. Good stuff! It's going to be a big help for trying to get the .NET Enterprise people wrapping their heads around web-scale data. I must admit "Apache Cloud Computing Edition" is sort of unwieldy to say verbally, and frankly "Java Enterprise Edition" is a taboo phrase at a lot of projects I've had. Guilt by association. I think I'll call it "Apache Cloud Stack", and reference "Apache Cloud Computing Edition" in my deck. When I think "Stack", I think of a suite of software that provides all the pieces I need to solve my problem :) On Tue, May 5, 2009 at 7:00 AM, Steve Loughran wrote: > Bradford Stephens wrote: >> >> Hey all, >> >> I'm going to be speaking at OSCON about my company's experiences with >> Hadoop and Friends, but I'm having a hard time coming up with a name >> for the entire software ecosystem. I'm thinking of calling it the >> "Apache CloudStack". Does this sound legit to you all? :) Is there >> something more 'official'? > > We've been using "Apache Cloud Computing Edition" for this, to emphasise > this is the successor to Java Enterprise Edition, and that it is cross > language and being built at apache. If you use the same term, even if you > put a different stack outline than us, it gives the idea more legitimacy. > > The slides that Andrew linked to are all in SVN under > http://svn.apache.org/repos/asf/labs/clouds/ > > we have a space in the apache labs for "apache clouds", where we want to do > more work integrating things, and bringing the idea of deploy and test on > someone else's infrastructure mainstream across all the apache products. We > would welcome your involvement -and if you send a draft of your slides out, > will happily review them > > -steve >
Re: What User Accounts Do People Use For Team Dev?
On Tue, May 5, 2009 at 10:44 AM, Dan Milstein wrote: > Best-practices-type question: when a single cluster is being used by a team > of folks to run jobs, how do people on this list handle user accounts? > > Many of the examples seem to show everything being run as root on the > master, which is hard to imagine is a great idea. > > Do you: > > - Create a distinct account for every developer who will need to run jobs? > > - Create a single hadoop-dev or hadoop-jobs account, have everyone use > that? > > - Just stick with running it all as root? > > Thanks, > -Dan Milstein > This is an interesting issue. First remember that the user that starts hadoop is considered the hadoop 'superuser'. You probably do not want to run hadoop as root, or someone could make an 'rm -rf /' map/reduce application and execute it across your cluster. We run hadoop as hadoop user. We use LDAP public key over ssh authentication. Every user has their own account and their own home directory /usr/ and /user/. (hadoop) Now the fun comes, user1 runs a process that creates files owned by 'user1'. No surprise. What happens when 'user2' needs to modify this file? This issue is not a hadoop issue, we have this same scenario with people trying to share any file system in unix. On the unix side sticky bit and umask help. What I do is give each user the ability to login as themselves and the team user. passwd hadoop:hadoop user1:user1 user2:user2 team1:team1 group groups team1:user1:user2 In this way the burden falls on the user to make sure the file ownership is correct. If user1 intends for user2 to see the work they have two options. 1) set liberal HDFS file permissions 2) start the process as team1 not 'user1' This is suitable for a development style cluster. Some people follow the policy that a production environment should not allow user access. In that case only one user would exist. passwd hadoop:hadoop mysoftware:mysoftware Code that runs on that type of cluster would be deployed and run by an automated process or configuration management system. Individual users could not directly log in.
Re: specific block size for a file
On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote: Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible. Now the DFS block size I'm using means that this job is only split to 3 mappers. I don't want to change the hdfs wide block size because it works for my other jobs. I would rather keep the big files on HDFS and use - Dmapred.min.split.size to get more maps to process your data http://hadoop.apache.org/core/docs/r0.20.0/mapred_tutorial.html#Job+Input Arun
Re: Sorting on several columns using KeyFieldSeparator and Paritioner
You must only have 3 fields in your keys. Try this - it is my best guess based on your code. Appendix A of my book has a detailed discussion of these fields and the gotchas, and the example code has test classes that allow you to try different keys with different input to see how the parts are actually broken out of the keys. jobConf.set("mapred.text.key.partitioner.options","-k2.2 -k1.1 -k3.3"); jobConf.set("mapred.text.key.comparator.options","-k2.2r -k1.1rn -k3.3n"); On Tue, May 5, 2009 at 2:05 AM, Min Zhou wrote: > I came across the same failure. Anyone can solve this problem? > > On Sun, Jan 18, 2009 at 9:06 AM, Saptarshi Guha >wrote: > > > Hello, > > I have a file with n columns, some which are text and some numeric. > > Given a sequence of indices, i would like to sort on those indices i.e > > first on Index1, then within Index2 and so on. > > In the example code below, i have 3 columns, numeric, text, numeric, > > space separated. > > Sort on 2(reverse), then 1(reverse,numeric) and lastly 3 > > > > Though my code runs (and gives wrong results,col 2 is sorted in > > reverse, and within that col3 which is treated as tex and then col1 ) > > on the local, when distributed I get a merge error - my guess is > > fixing the latter fixes the former. > > > > This is the error: > > java.io.IOException: Final merge failed > >at > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2093) > >at > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:457) > >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) > >at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 562 > >at > > > org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:128) > >at > > > org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compareByteSequence(KeyFieldBasedComparator.java:109) > >at > > > org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compare(KeyFieldBasedComparator.java:85) > >at > > org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:308) > >at > > org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144) > >at > > org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > >at > > > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:270) > >at > org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:285) > >at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:108) > >at > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2087) > >... 3 more > > > > > > Thanks for your time > > > > And the code (not too big) is > > ==CODE== > > > > public class RMRSort extends Configured implements Tool { > > > > static class RMRSortMap extends MapReduceBase implements > > Mapper { > > > >public void map(LongWritable key, Text value,OutputCollector > Text> output, Reporter reporter) throws IOException { > >output.collect(value,value); > >} > > } > > > >static class RMRSortReduce extends MapReduceBase implements > > Reducer { > > > >public void reduce(Text key, Iterator > > values,OutputCollector output, Reporter reporter) > > throws IOException { > >NullWritable n = NullWritable.get(); > >while(values.hasNext()) > >output.collect(n,values.next() ); > >} > >} > > > > > >static JobConf createConf(String rserveport,String uid,String > > infolder, String outfolder) > >Configuration defaults = new Configuration(); > >JobConf jobConf = new JobConf(defaults, RMRSort.class); > >jobConf.setJobName("Sorter: "+uid); > >jobConf.addResource(new > > Path(System.getenv("HADOOP_CONF_DIR")+"/hadoop-site.xml")); > > // jobConf.set("mapred.job.tracker", "local"); > >jobConf.setMapperClass(RMRSortMap.class); > >jobConf.setReducerClass(RMRSortReduce.class); > >jobConf.set("map.output.key.field.separator",fsep); > >jobConf.setPartitionerClass(KeyFieldBasedPartitioner.class); > >jobConf.set("mapred.text.key.partitioner.options","-k2,2 -k1,1 > > -k3,3"); > > > jobConf.setOutputKeyComparatorClass(KeyFieldBasedComparator.class); > >jobConf.set("mapred.text.key.comparator.options","-k2r,2r > -k1rn,1rn > > -k3n,3n"); > > //infolder, outfolder information removed > >jobConf.setMapOutputKeyClass(Text.class); > >jobConf.setMapOutputValueClass(Text.class); > >jobConf.setOutputKeyClass(NullWritable.class); > >return(jobConf); > >} > >public int run(String[] args) throws Exception { > >return(1); > >} > > > > } > > > > > > > > > > -- > > Saptarshi Guha - saptarshi.g...@gmail.com > > > > > > -- > My research interests are distributed systems, parallel computing and > byteco
What User Accounts Do People Use For Team Dev?
Best-practices-type question: when a single cluster is being used by a team of folks to run jobs, how do people on this list handle user accounts? Many of the examples seem to show everything being run as root on the master, which is hard to imagine is a great idea. Do you: - Create a distinct account for every developer who will need to run jobs? - Create a single hadoop-dev or hadoop-jobs account, have everyone use that? - Just stick with running it all as root? Thanks, -Dan Milstein
Re: specific block size for a file
Cheers, that worked. jason hadoop wrote: Please try -D dfs.block.size=4096000 The specification must be in bytes. On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote: Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible. Now the DFS block size I'm using means that this job is only split to 3 mappers. I don't want to change the hdfs wide block size because it works for my other jobs. Is there a way to give a specific file a different block size. The documentation says it is, but does not explain how. I've tried: hadoop dfs -D dfs.block.size=4M -put file /dest/ But that does not work. any help would be apreciated. Cheers, Chrulle
Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
Bradford Stephens wrote: Hey all, I'm going to be speaking at OSCON about my company's experiences with Hadoop and Friends, but I'm having a hard time coming up with a name for the entire software ecosystem. I'm thinking of calling it the "Apache CloudStack". Does this sound legit to you all? :) Is there something more 'official'? We've been using "Apache Cloud Computing Edition" for this, to emphasise this is the successor to Java Enterprise Edition, and that it is cross language and being built at apache. If you use the same term, even if you put a different stack outline than us, it gives the idea more legitimacy. The slides that Andrew linked to are all in SVN under http://svn.apache.org/repos/asf/labs/clouds/ we have a space in the apache labs for "apache clouds", where we want to do more work integrating things, and bringing the idea of deploy and test on someone else's infrastructure mainstream across all the apache products. We would welcome your involvement -and if you send a draft of your slides out, will happily review them -steve
Re: specific block size for a file
Please try -D dfs.block.size=4096000 The specification must be in bytes. On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote: > Hi all, > > I have a job that creates very big local files so i need to split it to as > many mappers as possible. Now the DFS block size I'm > using means that this job is only split to 3 mappers. I don't want to > change the hdfs wide block size because it works for my other jobs. > > Is there a way to give a specific file a different block size. The > documentation says it is, but does not explain how. > I've tried: > hadoop dfs -D dfs.block.size=4M -put file /dest/ > > But that does not work. > > any help would be apreciated. > > Cheers, > Chrulle > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Namenode failed to start with "FSNamesystem initialization failed" error
Hi. This quite worry-some issue. Can anyone advice on this? I'm really concerned it could appear in production, and cause a huge data loss. Is there any way to recover from this? Regards. 2009/5/5 Tamir Kamara > I didn't have a space problem which led to it (I think). The corruption > started after I bounced the cluster. > At the time, I tried to investigate what led to the corruption but didn't > find anything useful in the logs besides this line: > saveLeases found path > > /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2 > but no matching entry in namespace > > I also tried to recover from the secondary name node files but the > corruption my too wide-spread and I had to format. > > Tamir > > On Mon, May 4, 2009 at 4:48 PM, Stas Oskin wrote: > > > Hi. > > > > Same conditions - where the space has run out and the fs got corrupted? > > > > Or it got corrupted by itself (which is even more worrying)? > > > > Regards. > > > > 2009/5/4 Tamir Kamara > > > > > I had the same problem a couple of weeks ago with 0.19.1. Had to > reformat > > > the cluster too... > > > > > > On Mon, May 4, 2009 at 3:50 PM, Stas Oskin > wrote: > > > > > > > Hi. > > > > > > > > After rebooting the NameNode server, I found out the NameNode doesn't > > > start > > > > anymore. > > > > > > > > The logs contained this error: > > > > "FSNamesystem initialization failed" > > > > > > > > > > > > I suspected filesystem corruption, so I tried to recover from > > > > SecondaryNameNode. Problem is, it was completely empty! > > > > > > > > I had an issue that might have caused this - the root mount has run > out > > > of > > > > space. But, both the NameNode and the SecondaryNameNode directories > > were > > > on > > > > another mount point with plenty of space there - so it's very strange > > > that > > > > they were impacted in any way. > > > > > > > > Perhaps the logs, which were located on root mount and as a result, > > could > > > > not be written, have caused this? > > > > > > > > > > > > To get back HDFS running, i had to format the HDFS (including > manually > > > > erasing the files from DataNodes). While this reasonable in test > > > > environment > > > > - production-wise it would be very bad. > > > > > > > > Any idea why it happened, and what can be done to prevent it in the > > > future? > > > > I'm using the stable 0.18.3 version of Hadoop. > > > > > > > > Thanks in advance! > > > > > > > > > >
specific block size for a file
Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible. Now the DFS block size I'm using means that this job is only split to 3 mappers. I don't want to change the hdfs wide block size because it works for my other jobs. Is there a way to give a specific file a different block size. The documentation says it is, but does not explain how. I've tried: hadoop dfs -D dfs.block.size=4M -put file /dest/ But that does not work. any help would be apreciated. Cheers, Chrulle
Re: Sorting on several columns using KeyFieldSeparator and Paritioner
I came across the same failure. Anyone can solve this problem? On Sun, Jan 18, 2009 at 9:06 AM, Saptarshi Guha wrote: > Hello, > I have a file with n columns, some which are text and some numeric. > Given a sequence of indices, i would like to sort on those indices i.e > first on Index1, then within Index2 and so on. > In the example code below, i have 3 columns, numeric, text, numeric, > space separated. > Sort on 2(reverse), then 1(reverse,numeric) and lastly 3 > > Though my code runs (and gives wrong results,col 2 is sorted in > reverse, and within that col3 which is treated as tex and then col1 ) > on the local, when distributed I get a merge error - my guess is > fixing the latter fixes the former. > > This is the error: > java.io.IOException: Final merge failed >at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2093) >at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:457) >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) >at org.apache.hadoop.mapred.Child.main(Child.java:155) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 562 >at > org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:128) >at > org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compareByteSequence(KeyFieldBasedComparator.java:109) >at > org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compare(KeyFieldBasedComparator.java:85) >at > org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:308) >at > org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144) >at > org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) >at > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:270) >at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:285) >at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:108) >at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2087) >... 3 more > > > Thanks for your time > > And the code (not too big) is > ==CODE== > > public class RMRSort extends Configured implements Tool { > > static class RMRSortMap extends MapReduceBase implements > Mapper { > >public void map(LongWritable key, Text value,OutputCollector Text> output, Reporter reporter) throws IOException { >output.collect(value,value); >} > } > >static class RMRSortReduce extends MapReduceBase implements > Reducer { > >public void reduce(Text key, Iterator > values,OutputCollector output, Reporter reporter) > throws IOException { >NullWritable n = NullWritable.get(); >while(values.hasNext()) >output.collect(n,values.next() ); >} >} > > >static JobConf createConf(String rserveport,String uid,String > infolder, String outfolder) >Configuration defaults = new Configuration(); >JobConf jobConf = new JobConf(defaults, RMRSort.class); >jobConf.setJobName("Sorter: "+uid); >jobConf.addResource(new > Path(System.getenv("HADOOP_CONF_DIR")+"/hadoop-site.xml")); > // jobConf.set("mapred.job.tracker", "local"); >jobConf.setMapperClass(RMRSortMap.class); >jobConf.setReducerClass(RMRSortReduce.class); >jobConf.set("map.output.key.field.separator",fsep); >jobConf.setPartitionerClass(KeyFieldBasedPartitioner.class); >jobConf.set("mapred.text.key.partitioner.options","-k2,2 -k1,1 > -k3,3"); >jobConf.setOutputKeyComparatorClass(KeyFieldBasedComparator.class); >jobConf.set("mapred.text.key.comparator.options","-k2r,2r -k1rn,1rn > -k3n,3n"); > //infolder, outfolder information removed >jobConf.setMapOutputKeyClass(Text.class); >jobConf.setMapOutputValueClass(Text.class); >jobConf.setOutputKeyClass(NullWritable.class); >return(jobConf); >} >public int run(String[] args) throws Exception { >return(1); >} > > } > > > > > -- > Saptarshi Guha - saptarshi.g...@gmail.com > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
Hi Bradford, Your mail reminds me of something I recently came across: http://svn.apache.org/repos/asf/labs/clouds/apache_cloud_computing_edition.pdf Perhaps if you have slides accompanying your talk, you may consider to make them publicly available. I for one would love to see them. Best regards, - Andy > From: Bradford Stephens > Subject: What do we call Hadoop+HBase+Lucene+Zookeeper+etc > Date: Monday, May 4, 2009, 7:44 PM > Hey all, > > I'm going to be speaking at OSCON about my company's > experiences with Hadoop and Friends, but I'm having a > hard time coming up with a name for the entire software > ecosystem. I'm thinking of calling it the "Apache > CloudStack". Does this sound legit to you all? :) Is > there something more 'official'? > > Cheers, > Bradford