how to get lzo loaded?
Hi, At every beginning,I run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' successfully,but when run: nutch crawl url -dir crawl -depth 3,got errors: - - 10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl . 10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Exception in thread main java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) . at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) . ... 9 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.GzipCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) . ... 14 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec . at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 16 more - - So,here GzipCode didn't get loaded successfully,or maybe it will not be loaded by default,I don't know,but I think it should be,then I followed this link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to install lzo and run: nutch crawl url -dir crawl -depth 3 again,got errors: - - 10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl . 10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries. 10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Exception in thread main java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) . at org.apache.nutch.crawl.Injector.inject(Injector.java:211) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) . at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.GzipCodec not found. . at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) ... 14 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) . at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 16 more - - run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got errors: - - java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at
Debuging hadoop core in distributed settings
Hi All, I have followed following steps for debugging the hadoop core in single node version. Since, i want to check my system in distributed settings -- could anyone please help me to debug code when we run the hadoop jar in distributed settings ? thanks, --Pramod On Wed, Jul 14, 2010 at 12:07 AM, Pramy Bhats pramybh...@googlemail.comwrote: Hi, I am trying to debug the new built hadoop-core-dev.jar in Eclipse. To simplify the debug process, firstly I setup the Hadoop in single-node mode on my localhost. a) configure debug in eclipse, under tab main: project : hadoop-all main-class: org.apache.hadoop.util.RunJar under tab arguments: program arguments: absolute path for wordcount jar file/wordcount.jar org.wordcount.WordCount input-text-file-already-in-hdfs (text) desired-output-file (output) VM arguments: -Xmx256M under tab classpath: user entries : add external jar ( hadoop-0.20.3-core-dev.jar ) == so that I can debug my new built hadoop core jar. under tab source: I add the source file folder for the wordcount example ( in order lookup for the debug process). I apply these configuration and start debug process. b) the debugging works fine, and i can perform all operations for debug. However, i get following problem 2010-07-14 00:02:15,816 WARN conf.Configuration (Configuration.java:clinit(176)) - DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 2010-07-14 00:02:16,535 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread main org.apache.hadoop.mapreduce.lib.input.InvalidInputException: *Input path does not exist: file:/home/hadoop/code/hadoop-0.20.2/text* at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at org.selfadjust.wordcount.WordCount.run(WordCount.java:32) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.selfadjust.wordcount.WordCount.main(WordCount.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, the file named text is the file already stored in the hdfs. Could you please help me with debugging process here, any pointers to the debugging environment would be very helpful. thanks, --PB
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya alexander.l...@gmail.com wrote: Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? I don't know that anyone has tested it against 0.21 or trunk, but I don't see any reasons it won't work just fine -- the APIs are pretty stable between 0.20 and above. -Todd On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote: On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote: Hi Josh, No real pain points... just trying to investigate/research the best way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 repositories to build from and I am trying to find out if one should be used over the other. For instance, in your previous posting, you refer to hadoop-gpl-compression while the Twitter blog post from last year mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO is preferable but we're curious if there are any caveats/gotchas we should be aware of. Yes, definitely use the hadoop-lzo project from github -- either from my repo or from kevinweil's (the two are kept in sync) The repo on Google Code has a number of known bugs, which is why we forked it over to github last year. -Todd On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote: Bobby, We're working hard to make compression easier, the biggest hurdle currently is the licensing issues around the LZO codec libs (GPL, which is not compatible with ASF bsd-style license). Outside of making the changes to the mapred-site.xml file, with your setup would do you view as the biggest pain point? Josh Patterson Cloudera On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com bdennett%2bsoftw...@gmail.com bdennett%252bsoftw...@gmail.com wrote: We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo (http://github.com/kevinweil/hadoop-lzo). Some of what appear to be the better instructions/guides out there: * Josh Patterson's reply on June 25th to the Newbie to HDFS compression thread -- http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/% 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e * hadoop-gpl-compression FAQ -- http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ * Hadoop at Twitter (part 1): Splittable LZO Compression blog post -- http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable- lzo-compression/ Thanks in advance, -Bobby -- Todd Lipcon Software Engineer, Cloudera