Re: Jobs stalling forever
This is due to HADOOP-5233. Got fixed in branch 0.19.2 -Amareshwari Nathan Marz wrote: Every now and then, I have jobs that stall forever with one map task remaining. The last map task remaining says it is at "100%" and in the logs, it says it is in the process of committing. However, the task never times out, and the job just sits there forever. Has anyone else seen this? Is there a JIRA ticket open for it already?
Re: How to increase replication factor
Thank you very much. On Tue, Mar 10, 2009 at 6:58 PM, Tamir Kamara wrote: > You can use the setrep option to (re)set the replication of specific files > and directories. More details can be found here: > http://hadoop.apache.org/core/docs/current/hdfs_shell.html#setrep > > > On Tue, Mar 10, 2009 at 12:28 PM, Edwin Chu wrote: > > > Hi > > I am adding some new nodes to an Hadoop cluster and try to increase the > > replication factor. I changed the replication factor value in > > hadoop-site.xml and then restarted the cluster using the stop-all.sh and > > start-all.sh script. Then, I run hadoop fsck. It reports that the fs is > > healthy, but I found that the Average block replication value is less > than > > the configured replication factor. I guess the existing blocks are not > > re-replicated after changing the replication factor. Can I force the > > existing blocks to be replicated according to the new replication factor? > > > > Regards > > Edwin > > >
Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value
I have large number of key,value pairs. I don't actually care if data goes in value or key. Let me be more exact. (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each pair. I can put it in keys or values. I have experimented with both options (heavy key , light value) vs (light key, heavy value). It turns out that hk,lv option is much much better than (lk,hv). Has someone else also noticed this? Is there a way to make things faster in light key , heavy value option. As some application will need that also. Remember in both cases we are talking about atleast dozen or so million pairs. There is a difference of time in shuffle phase. Which is weird as amount of data transferred is same. -gyanit -- View this message in context: http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Error while putting data onto hdfs
I was trying to put a 1 gig file onto HDFS and I got the following error: 09/03/10 18:23:16 WARN hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/171.69.102.53:34414remote=/ 171.69.102.51:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.DataOutputStream.write(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 09/03/10 18:23:16 WARN hdfs.DFSClient: Error Recovery for block blk_2971879428934911606_36678 bad datanode[0] 171.69.102.51:50010 put: All datanodes 171.69.102.51:50010 are bad. Aborting... Exception closing file /user/amkhuran/221rawdata/1g java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198) at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243) at org.apache.hadoop.fs.FsShell.close(FsShell.java:1842) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1856) Whats going wrong? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: question about released version id
Thanks a lot, Owen and Rasit. Is there any rule for deciding incompatible API changes will be put into trunk or branch? The criterion is "time" or "importance of these changes" or "number of these chnages" or others? Chu 2009/3/10 Owen O'Malley > On Mar 2, 2009, at 11:46 PM, 鞠適存 wrote: > >> I wonder how to make the hadoop version number. >> > > Each 0.18, 0.19 and 0.20 have their own branch. The first release on each > branch is 0.X.0, and then 0.X.1 and so on. New features are only put into > trunk and only important bug fixes are put into the branches. So there will > be no new functionality going from 0.X.1 to 0.X.2, but there will be going > from a release of 0.X to 0.X+1. > > -- Owen
Re: Extending ClusterMapReduceTestCase
The other goofy thing is that the xml parser that is commonly first in the class path, validates xml in a way that is opposite to what jetty wants. This line in the preamble before theClusterMapReduceTestCase setup takes care of the xml errors. System.setProperty("javax.xml.parsers.SAXParserFactory","org.apache.xerces.jaxp.SAXParserFactoryImpl"); On Tue, Mar 10, 2009 at 2:28 PM, jason hadoop wrote: > There are a couple of failures that happen in tests derived from > ClusterMapReduceTestCase that are run outside of the hadoop unit test > framework. > > The basic issue is that the unit test doesn't have the benefit of a runtime > environment setup by the bin/hadoop script. > > The classpath is usually missing the lib/jetty-ext/*.jar files, and doesn't > get the conf/hadoop-default.xml and conf/hadoop-site.xml. > The *standard* properties are also unset.. hadoop.log.dir, > hadoop.log.file, hadoop.home.dir, hadoop.id.str, hadoop.root.logger > > I find that I can get away with just defining hadoop.log.dir. > > You can read about this in detail in the chapter on unit testing map/reduce > jobs in my book, out real soon now :) > > > > > On Tue, Mar 10, 2009 at 12:08 PM, Brian Forney wrote: > >> Hi all, >> >> I'm trying to write a JUnit test case that extends >> ClusterMapReduceTestCase >> to test some code I've written to ease job submission and monitoring >> between >> some existing code. Unfortunately, I see the following problem and cannot >> find the jetty 5.1.4 code anywhere online. Any ideas about why this is >> happening? >> >>[junit] Testsuite: com.integral7.batch.hadoop.test.TestJobController >>[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 1.384 sec >>[junit] >>[junit] - Standard Output --- >>[junit] 2009-03-10 12:52:26,303 [main] ERROR >> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java >> :290) - FSNamesystem initialization failed. >>[junit] java.io.IOException: Problem starting http server >>[junit] at >> org.apache.hadoop.http.HttpServer.start(HttpServer.java:343) >>[junit] at >> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. >> java:379) >>[junit] at >> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java >> :288) >>[junit] at >> >> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 >> ) >>[junit] at >> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) >>[junit] at >> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) >>[junit] at >> >> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java >> :859) >>[junit] at >> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275) >>[junit] at >> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119) >>[junit] at >> >> org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapRed >> uceTestCase.java:81) >>[junit] at >> >> org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTest >> Case.java:56) >>[junit] at >> >> com.integral7.batch.hadoop.test.TestJobController.setUp(TestJobController.ja >> va:49) >>[junit] at junit.framework.TestCase.runBare(TestCase.java:132) >>[junit] at >> junit.framework.TestResult$1.protect(TestResult.java:110) >>[junit] at >> junit.framework.TestResult.runProtected(TestResult.java:128) >>[junit] at junit.framework.TestResult.run(TestResult.java:113) >>[junit] at junit.framework.TestCase.run(TestCase.java:124) >>[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232) >>[junit] at junit.framework.TestSuite.run(TestSuite.java:227) >>[junit] at >> >> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81 >> ) >>[junit] at >> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36) >>[junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRu >> nner.java:421) >>[junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTes >> tRunner.java:912) >>[junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestR >> unner.java:766) >>[junit] Caused by: >> org.mortbay.util.MultiException[org.xml.sax.SAXParseException: The >> processing instruction target matching "[xX][mM][lL]" is not allowed., >> org.xml.sax.SAXParseException: The processing instruction target matching >> "[xX][mM][lL]" is not allowed.] >>[junit] at org.mortbay.http.HttpServer.doStart(HttpServer.java:731) >>[junit] at org.mortbay.util.Container.start(Container.java:72) >>[junit] at >> org.apache.hadoop.http.HttpServer.start(HttpServer.java:321) >>[junit] ... 23 more >>[junit] - --- >>[junit] Testc
Re: Extending ClusterMapReduceTestCase
There are a couple of failures that happen in tests derived from ClusterMapReduceTestCase that are run outside of the hadoop unit test framework. The basic issue is that the unit test doesn't have the benefit of a runtime environment setup by the bin/hadoop script. The classpath is usually missing the lib/jetty-ext/*.jar files, and doesn't get the conf/hadoop-default.xml and conf/hadoop-site.xml. The *standard* properties are also unset.. hadoop.log.dir, hadoop.log.file, hadoop.home.dir, hadoop.id.str, hadoop.root.logger I find that I can get away with just defining hadoop.log.dir. You can read about this in detail in the chapter on unit testing map/reduce jobs in my book, out real soon now :) On Tue, Mar 10, 2009 at 12:08 PM, Brian Forney wrote: > Hi all, > > I'm trying to write a JUnit test case that extends ClusterMapReduceTestCase > to test some code I've written to ease job submission and monitoring > between > some existing code. Unfortunately, I see the following problem and cannot > find the jetty 5.1.4 code anywhere online. Any ideas about why this is > happening? > >[junit] Testsuite: com.integral7.batch.hadoop.test.TestJobController >[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 1.384 sec >[junit] >[junit] - Standard Output --- >[junit] 2009-03-10 12:52:26,303 [main] ERROR > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java > :290) - FSNamesystem initialization failed. >[junit] java.io.IOException: Problem starting http server >[junit] at > org.apache.hadoop.http.HttpServer.start(HttpServer.java:343) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. > java:379) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java > :288) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 > ) >[junit] at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) >[junit] at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java > :859) >[junit] at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275) >[junit] at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119) >[junit] at > > org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapRed > uceTestCase.java:81) >[junit] at > > org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTest > Case.java:56) >[junit] at > > com.integral7.batch.hadoop.test.TestJobController.setUp(TestJobController.ja > va:49) >[junit] at junit.framework.TestCase.runBare(TestCase.java:132) >[junit] at junit.framework.TestResult$1.protect(TestResult.java:110) >[junit] at > junit.framework.TestResult.runProtected(TestResult.java:128) >[junit] at junit.framework.TestResult.run(TestResult.java:113) >[junit] at junit.framework.TestCase.run(TestCase.java:124) >[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232) >[junit] at junit.framework.TestSuite.run(TestSuite.java:227) >[junit] at > > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81 > ) >[junit] at > junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36) >[junit] at > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRu > nner.java:421) >[junit] at > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTes > tRunner.java:912) >[junit] at > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestR > unner.java:766) >[junit] Caused by: > org.mortbay.util.MultiException[org.xml.sax.SAXParseException: The > processing instruction target matching "[xX][mM][lL]" is not allowed., > org.xml.sax.SAXParseException: The processing instruction target matching > "[xX][mM][lL]" is not allowed.] >[junit] at org.mortbay.http.HttpServer.doStart(HttpServer.java:731) >[junit] at org.mortbay.util.Container.start(Container.java:72) >[junit] at > org.apache.hadoop.http.HttpServer.start(HttpServer.java:321) >[junit] ... 23 more >[junit] - --- >[junit] Testcase: > testJobSubmission(com.integral7.batch.hadoop.test.TestJobController): > Caused an ERROR >[junit] Problem starting http server >[junit] java.io.IOException: Problem starting http server >[junit] at > org.apache.hadoop.http.HttpServer.start(HttpServer.java:343) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. > java:379) >[junit] at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java > :288) >[junit] at > > org.apache.hadoop.hdfs.
Jobs stalling forever
Every now and then, I have jobs that stall forever with one map task remaining. The last map task remaining says it is at "100%" and in the logs, it says it is in the process of committing. However, the task never times out, and the job just sits there forever. Has anyone else seen this? Is there a JIRA ticket open for it already?
Re: HDFS is corrupt, need to salvage the data.
Raghu Angadi wrote: The block files usually don't disappear easily. Check on the datanode if you find any files starting with "blk". Also check datanode log to see what happened there... may be use started on a different directory or something like that. Raghu. There are indeed blk files: find -name 'blk*' | wc -l 158 I didn't see anything out of the ordinary in the datanode log. At this point is there anything I can do to recover the files? Or do I need to reformat the data node and load the data in again ? thanks
Re: HDFS is corrupt, need to salvage the data.
Mayuran Yogarajah wrote: lohit wrote: How many Datanodes do you have. From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others? Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running. Lohit There is only one data node at the moment. Does this mean the data is not recoverable? The HD on the machine seems fine so I'm a little confused as to what caused the HDFS to become corrupted. The block files usually don't disappear easily. Check on the datanode if you find any files starting with "blk". Also check datanode log to see what happened there... may be use started on a different directory or something like that. Raghu.
Why LineRecrodReader has 3 constructors?
org.apache.hadoop.mapred.LineRecordReader has 3 constructors. The one in the following is used normally. public LineRecordReader(Configuration job, FileSplit split) throws IOException But when the following ones are used? I commented them and re-compiled the code without errors. So probably they are not used directly in Hadoop core code. But then why are they there in the code? public LineRecordReader(InputStream in, long offset, long endOffset, int maxLineLength) public LineRecordReader(InputStream in, long offset, long endOffset, Configuration job) Thanks, Abdul Qadeer
Extending ClusterMapReduceTestCase
Hi all, I'm trying to write a JUnit test case that extends ClusterMapReduceTestCase to test some code I've written to ease job submission and monitoring between some existing code. Unfortunately, I see the following problem and cannot find the jetty 5.1.4 code anywhere online. Any ideas about why this is happening? [junit] Testsuite: com.integral7.batch.hadoop.test.TestJobController [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 1.384 sec [junit] [junit] - Standard Output --- [junit] 2009-03-10 12:52:26,303 [main] ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java :290) - FSNamesystem initialization failed. [junit] java.io.IOException: Problem starting http server [junit] at org.apache.hadoop.http.HttpServer.start(HttpServer.java:343) [junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. java:379) [junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java :288) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 ) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java :859) [junit] at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275) [junit] at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119) [junit] at org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapRed uceTestCase.java:81) [junit] at org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTest Case.java:56) [junit] at com.integral7.batch.hadoop.test.TestJobController.setUp(TestJobController.ja va:49) [junit] at junit.framework.TestCase.runBare(TestCase.java:132) [junit] at junit.framework.TestResult$1.protect(TestResult.java:110) [junit] at junit.framework.TestResult.runProtected(TestResult.java:128) [junit] at junit.framework.TestResult.run(TestResult.java:113) [junit] at junit.framework.TestCase.run(TestCase.java:124) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:232) [junit] at junit.framework.TestSuite.run(TestSuite.java:227) [junit] at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81 ) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRu nner.java:421) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTes tRunner.java:912) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestR unner.java:766) [junit] Caused by: org.mortbay.util.MultiException[org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed., org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed.] [junit] at org.mortbay.http.HttpServer.doStart(HttpServer.java:731) [junit] at org.mortbay.util.Container.start(Container.java:72) [junit] at org.apache.hadoop.http.HttpServer.start(HttpServer.java:321) [junit] ... 23 more [junit] - --- [junit] Testcase: testJobSubmission(com.integral7.batch.hadoop.test.TestJobController): Caused an ERROR [junit] Problem starting http server [junit] java.io.IOException: Problem starting http server [junit] at org.apache.hadoop.http.HttpServer.start(HttpServer.java:343) [junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. java:379) [junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java :288) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 ) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java :859) [junit] at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275) [junit] at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119) [junit] at org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapRed uceTestCase.java:81) [junit] at org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTest Case.java:56) [junit] at com.integral7.batch.hadoop.test.TestJobController.setUp(TestJobController.ja va:49) [junit] Caused by: org.mortbay.util.MultiException[org.xml.sax.SAXParseExcepti
streaming inputformat: class not found
Hello, I'm try to run a mapreduce job on a data file in which the keys and values alternate rows. E.g. key1 value1 key2 ... I've written my own InputFormat by extending FileInputFormat (the code for this class is below.) The problem is that when I run hadoop streaming with the command bin/hadoop jar contrib/streaming/hadoop-0.18.3-streaming.jar -mapper mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test.data -output test-output -file -inputformat MyFormatter I get the error -inputformat : class not found : MyFormatter java.lang.RuntimeException: -inputformat : class not found : MyFormatter at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:550) ... I have tried putting .java, .class, and .jar file of MyFormatter in the job jar using the -file parameter. I have also tried putting them on the hdfs using -copyFromLocal, but I still get the same error. Can anyone give me some hints as to what the problem might be? Also, I tried to hack together my formatter based on the hadoop examples, so does it seems like it should properly process the input files I described above? Trevis public final class MyFormatter extends org.apache.hadoop.mapred.FileInputFormat { @Override public RecordReader getRecordReader( InputSplit split, JobConf job, Reporter reporter ) throws IOException { return new MyRecordReader( job, (FileSplit) split ); } static class MyRecordReader implements RecordReader { private LineRecordReader _in = null; private LongWritable _junk = null; public FastaRecordReader( JobConf job, FileSplit split ) throws IOException { _junk = new LongWritable(); _in = new LineRecordReader( job, split ); } @Override public void close() throws IOException { _in.close(); } @Override public Text createKey() { return new Text(); } @Override public Text createValue() { return new Text(); } @Override public long getPos() throws IOException { return _in.getPos(); } @Override public float getProgress() throws IOException { return _in.getProgress(); } @Override public boolean next( Text key, Text value ) throws IOException { if ( _in.next( _junk, key ) ) { if ( _in.next( _junk, value ) ) { return true; } } key.clear(); value.clear(); return false; } } } -- View this message in context: http://www.nabble.com/streaming-inputformat%3A-class-not-found-tp22439420p22439420.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: HDFS is corrupt, need to salvage the data.
lohit wrote: How many Datanodes do you have. From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others? Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running. Lohit There is only one data node at the moment. Does this mean the data is not recoverable? The HD on the machine seems fine so I'm a little confused as to what caused the HDFS to become corrupted. M
Re: Support for zipped input files
Thanks very much, Tom. You saved me a lot of time by confirming that it isn't available yet. I'll go vote for HADOOP-1824. On Tue, Mar 10, 2009 at 3:23 AM, Tom White wrote: > Hi Ken, > > Unfortunately, Hadoop doesn't yet support MapReduce on zipped files > (see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll > need to write a program to unzip them and write them into HDFS first. > > Cheers, > Tom > > On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop > wrote: > > Hadoop has support for S3, the compression support is handled at another > > level and should also work. > > > > > > On Mon, Mar 9, 2009 at 9:05 PM, Ken Weiner wrote: > > > >> I have a lot of large zipped (not gzipped) files sitting in an Amazon S3 > >> bucket that I want to process. What is the easiest way to process them > >> with > >> a Hadoop map-reduce job? Do I need to write code to transfer them out > of > >> S3, unzip them, and then move them to HDFS before running my job, or > does > >> Hadoop have support for processing zipped input files directly from S3? > >> > > > > > > > > -- > > Alpha Chapters of my book on Hadoop are available > > http://www.apress.com/book/view/9781430219422 > > >
Re: Native Libraries
I've been able to notice this kind of output in the job tracker web interface. Open a job and drill down to one of the task logs and select 'All'. Should be somewhere on the top of the output. On Tue, Mar 10, 2009 at 2:52 PM, Tamir Kamara wrote: > Hi, > > I'm using hadoop 0.18.3 and I wish to see the status of hadoop native > libraries. According to > http://hadoop.apache.org/core/docs/r0.18.3/native_libraries.html I should be > seeing something like: > INFO util.NativeCodeLoader - Loaded the native-hadoop library > or: > INFO util.NativeCodeLoader - Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable > > I've scanned all the files in the log directory on several nodes but I can > locate and thing about this. > > Why do you think I can't see anything in the logs ? > Should I do a specific thing in order to use the libraries ? > > Thanks, > Tamir >
Native Libraries
Hi, I'm using hadoop 0.18.3 and I wish to see the status of hadoop native libraries. According to http://hadoop.apache.org/core/docs/r0.18.3/native_libraries.html I should be seeing something like: INFO util.NativeCodeLoader - Loaded the native-hadoop library or: INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable I've scanned all the files in the log directory on several nodes but I can locate and thing about this. Why do you think I can't see anything in the logs ? Should I do a specific thing in order to use the libraries ? Thanks, Tamir
Re: Support for zipped input files
There is LZO support with a patch: http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html Cheers Tim On Tue, Mar 10, 2009 at 12:23 PM, Tom White wrote: > Hi Ken, > > Unfortunately, Hadoop doesn't yet support MapReduce on zipped files > (see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll > need to write a program to unzip them and write them into HDFS first. > > Cheers, > Tom > > On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop wrote: >> Hadoop has support for S3, the compression support is handled at another >> level and should also work. >> >> >> On Mon, Mar 9, 2009 at 9:05 PM, Ken Weiner wrote: >> >>> I have a lot of large zipped (not gzipped) files sitting in an Amazon S3 >>> bucket that I want to process. What is the easiest way to process them >>> with >>> a Hadoop map-reduce job? Do I need to write code to transfer them out of >>> S3, unzip them, and then move them to HDFS before running my job, or does >>> Hadoop have support for processing zipped input files directly from S3? >>> >> >> >> >> -- >> Alpha Chapters of my book on Hadoop are available >> http://www.apress.com/book/view/9781430219422 >> >
Re: Support for zipped input files
Hi Ken, Unfortunately, Hadoop doesn't yet support MapReduce on zipped files (see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll need to write a program to unzip them and write them into HDFS first. Cheers, Tom On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop wrote: > Hadoop has support for S3, the compression support is handled at another > level and should also work. > > > On Mon, Mar 9, 2009 at 9:05 PM, Ken Weiner wrote: > >> I have a lot of large zipped (not gzipped) files sitting in an Amazon S3 >> bucket that I want to process. What is the easiest way to process them >> with >> a Hadoop map-reduce job? Do I need to write code to transfer them out of >> S3, unzip them, and then move them to HDFS before running my job, or does >> Hadoop have support for processing zipped input files directly from S3? >> > > > > -- > Alpha Chapters of my book on Hadoop are available > http://www.apress.com/book/view/9781430219422 >
Re: How to increase replication factor
You can use the setrep option to (re)set the replication of specific files and directories. More details can be found here: http://hadoop.apache.org/core/docs/current/hdfs_shell.html#setrep On Tue, Mar 10, 2009 at 12:28 PM, Edwin Chu wrote: > Hi > I am adding some new nodes to an Hadoop cluster and try to increase the > replication factor. I changed the replication factor value in > hadoop-site.xml and then restarted the cluster using the stop-all.sh and > start-all.sh script. Then, I run hadoop fsck. It reports that the fs is > healthy, but I found that the Average block replication value is less than > the configured replication factor. I guess the existing blocks are not > re-replicated after changing the replication factor. Can I force the > existing blocks to be replicated according to the new replication factor? > > Regards > Edwin >
How to increase replication factor
Hi I am adding some new nodes to an Hadoop cluster and try to increase the replication factor. I changed the replication factor value in hadoop-site.xml and then restarted the cluster using the stop-all.sh and start-all.sh script. Then, I run hadoop fsck. It reports that the fs is healthy, but I found that the Average block replication value is less than the configured replication factor. I guess the existing blocks are not re-replicated after changing the replication factor. Can I force the existing blocks to be replicated according to the new replication factor? Regards Edwin
Re: Batch processing map reduce jobs
Check out Cascading, it worked great for me. http://www.cascading.org/ Jimmy Wan On Thu, Mar 5, 2009 at 17:53, Richa Khandelwal wrote: > Hi All, > Does anyone know how to run map reduce jobs using pipes or batch process map > reduce jobs?