how to get lzo loaded?

2010-08-08 Thread Alex Luya
Hi,
 At every beginning,I  run:hadoop jar hadoop-*-examples.jar grep input output 
'dfs[a-z.]+' successfully,but when  run:
nutch crawl url -dir crawl -depth 3,got errors:
 - 
-
10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl
.
10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
Exception in thread main java.lang.RuntimeException: Error in configuring 
object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
.
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
.
... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec 
org.apache.hadoop.io.compress.GzipCodec not found.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
.
... 14 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.io.compress.GzipCodec
.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 16 more

 - 
-
So,here GzipCode didn't get loaded successfully,or maybe it will not be loaded 
by default,I don't know,but I think it should be,then I followed this 
link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ  to install lzo 
and run:
nutch crawl url -dir crawl -depth 3 again,got errors:
 - 
-
10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl
.
10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to 
crawl db entries.
10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
Exception in thread main java.lang.RuntimeException: Error in configuring 
object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
.
at org.apache.nutch.crawl.Injector.inject(Injector.java:211)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
.
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec 
org.apache.hadoop.io.compress.GzipCodec not found.
.
at 
org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41)
... 14 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.io.compress.GzipCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 16 more

 - 
-
run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got errors:
 - 
-
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 

Debuging hadoop core in distributed settings

2010-08-08 Thread Pramy Bhats
Hi All,

I have followed following steps for debugging the hadoop core in single node
version. Since, i want to check my system in distributed settings -- could
anyone please help me to debug code when we run the hadoop jar in
distributed settings ?

thanks,
--Pramod

On Wed, Jul 14, 2010 at 12:07 AM, Pramy Bhats pramybh...@googlemail.comwrote:

 Hi,

 I am trying to debug the new built hadoop-core-dev.jar in Eclipse. To
 simplify the debug process, firstly I setup the Hadoop in single-node mode
 on my localhost.


 a)  configure debug in eclipse,

 under tab main:
   project : hadoop-all
   main-class: org.apache.hadoop.util.RunJar

under tab arguments:

program arguments: absolute path for wordcount jar file/wordcount.jar
  org.wordcount.WordCount   input-text-file-already-in-hdfs (text)
  desired-output-file (output)
VM arguments: -Xmx256M


   under tab classpath:

   user entries :  add external jar  ( hadoop-0.20.3-core-dev.jar ) == so
 that I can debug my new built hadoop core jar.


 under tab source:

  I add the source file folder for the wordcount example ( in order lookup
 for the debug process).


 I apply these configuration and start debug process.


 b)  the debugging works fine, and i can perform all operations for debug.
 However, i get following problem

 2010-07-14 00:02:15,816 WARN  conf.Configuration
 (Configuration.java:clinit(176)) - DEPRECATED: hadoop-site.xml found in
 the classpath. Usage of hadoop-site.xml is deprecated. Instead use
 core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
 core-default.xml, mapred-default.xml and hdfs-default.xml respectively
 2010-07-14 00:02:16,535 INFO  jvm.JvmMetrics (JvmMetrics.java:init(71)) -
 Initializing JVM Metrics with processName=JobTracker, sessionId=
 Exception in thread main
 org.apache.hadoop.mapreduce.lib.input.InvalidInputException: *Input path
 does not exist: file:/home/hadoop/code/hadoop-0.20.2/text*
  at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
  at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.selfadjust.wordcount.WordCount.run(WordCount.java:32)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.selfadjust.wordcount.WordCount.main(WordCount.java:43)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


 However, the file named text is the file already stored in the hdfs.


 Could you please help me with debugging process here, any pointers to the
 debugging environment would be very helpful.


 thanks,

 --PB





Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-08 Thread Todd Lipcon
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya alexander.l...@gmail.com wrote:

 Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22?


I don't know that anyone has tested it against 0.21 or trunk, but I don't
see any reasons it won't work just fine  -- the APIs are pretty stable
between 0.20 and above.

-Todd


 On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
  On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com
 wrote:
   Hi Josh,
  
   No real pain points... just trying to investigate/research the best
   way to create the necessary libraries and jar files to support LZO
   compression in Hadoop. In particular, there are the 2 repositories
   to build from and I am trying to find out if one should be used over
   the other. For instance, in your previous posting, you refer to
   hadoop-gpl-compression while the Twitter blog post from last year
   mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
   is preferable but we're curious if there are any caveats/gotchas we
   should be aware of.
 
  Yes, definitely use the hadoop-lzo project from github -- either from my
  repo or from kevinweil's (the two are kept in sync)
 
  The repo on Google Code has a number of known bugs, which is why we
 forked
  it over to github last year.
 
  -Todd
 
  On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote:
Bobby,
   
We're working hard to make compression easier, the biggest hurdle
currently is the licensing issues around the LZO codec libs (GPL,
which is not compatible with ASF bsd-style license).
   
Outside of making the changes to the mapred-site.xml file, with your
setup would do you view as the biggest pain point?
   
Josh Patterson
Cloudera
   
On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
   
bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com 
 bdennett%2bsoftw...@gmail.com bdennett%252bsoftw...@gmail.com wrote:
We are looking to enable LZO compression of the map outputs on our
Cloudera 0.20.1 cluster. It seems there are various sets of
instructions available and I am curious what your thoughts are
regarding which one would be best for our Hadoop distribution and OS
(Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
(http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
(http://github.com/kevinweil/hadoop-lzo).
   
Some of what appear to be the better instructions/guides out there:
* Josh Patterson's reply on June 25th to the Newbie to HDFS
compression thread --
  
  
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
   3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
  
* hadoop-gpl-compression FAQ --
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* Hadoop at Twitter (part 1): Splittable LZO Compression blog post
--
  
  
 http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
   lzo-compression/
  
Thanks in advance,
-Bobby




-- 
Todd Lipcon
Software Engineer, Cloudera