Re: Can't construct instance of class org.apache.hadoop.conf.Configuration
Hi, I would try this: export CLASSPATH=$(hadoop classpath) Brock On Mon, Apr 30, 2012 at 10:15 AM, Ryan Cole r...@rycole.com wrote: Hello, I'm trying to run an application, written in C++, that uses libhdfs. I have compiled the code and get an error when I attempt to run the application. The error that I am getting is as follows: Can't construct instance of class org.apache.hadoop.conf.Configuration. Initially, I was receiving an error saying that CLASSPATH was not set. That was easy, so I set CLASSPATH to include the following three directories, in this order: 1. $HADOOP_HOME 2. $HADOOP_HOME/lib 3. $HADOOP_HOME/conf The CLASSPATH not set error went away, and now I receive the error about the Configuration class. I'm assuming that I do not have something on the path that I need to, but everything I have read says to simply include these three directories. Does anybody have any idea what I might be missing? Full exception pasted below. Thanks, Ryan Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) Can't construct instance of class org.apache.hadoop.conf.Configuration node: /home/ryan/.node-gyp/0.7.8/src/node_object_wrap.h:61: void node::ObjectWrap::Wrap(v8::Handlev8::Object): Assertion `handle_.IsEmpty()' failed. Aborted (core dumped) -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
[ANNOUNCE] Apache MRUnit 0.8.0-incubating released
The Apache MRUnit team is pleased to announce the release of MRUnit 0.8.0-incubating from the Apache Incubator. This is the second release of Apache MRUnit, a Java library that helps developers unit test Apache Hadoop map reduce jobs. The release is available here: http://www.apache.org/dyn/closer.cgi/incubator/mrunit/ The full change log is available here: https://issues.apache.org/jira/browse/MRUNIT/fixforversion/12316359 We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at http://incubator.apache.org/mrunit/ The Apache MRUnit Team
Re: race condition in hadoop 0.20.2 (cdh3u1)
Hi, tl;dr DUMMY should not be static. On Tue, Jan 17, 2012 at 3:21 PM, Stan Rosenberg srosenb...@proclivitysystems.com wrote: class MyKeyT implements WritableComparableT { private String ip; // first part of the key private final static Text DUMMY = new Text(); ... public void write(DataOutput out) throws IOException { // serialize the first part of the key DUMMY.set(ip); DUMMY.write(out); ... } public void readFields(DataInput in) throws IOException { // de-serialize the first part of the key DUMMY.readFields(in); ip = DUMMY.toString(); } } This class is invalid. A single thread will be executing your mapper or reducer but there will be multiple threads (background threads such as the SpillThread) creating MyKey instances which is exactly what you are seeing. This is by design. Brock
Re: desperate question about NameNode startup sequence
Hi, Since your using CDH2, I am moving this to CDH-USER. You can subscribe here: http://groups.google.com/a/cloudera.org/group/cdh-user BCC'd common-user On Sat, Dec 17, 2011 at 2:01 AM, Meng Mao meng...@gmail.com wrote: Maybe this is a bad sign -- the edits.new was created before the master node crashed, and is huge: -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current total 41G -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27 2011 edits -rw-r--r-- 1 hadoop hadoop 39G Dec 17 00:44 edits.new -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27 2011 fsimage -rw-r--r-- 1 hadoop hadoop 8 Jan 27 2011 fstime -rw-r--r-- 1 hadoop hadoop 101 Jan 27 2011 VERSION could this mean something was up with our SecondaryNameNode and rolling the edits file? Yes it looks like a checkpoint never completed. It's a good idea to monitor the mtime on fsimage to ensure it never gets too old. Has a checkpoint completed since you restarted? Brock
Re: ArrayWritable usage
Hi, ArrayWritable is a touch hard to use. Say you have an array of IntWritable[]. The get() method or ArrayWritable, after serializations/deserialization, does in fact return an array of type Writable. As such you cannot cast it directly to IntWritable[]. Individual elements are of type IntWritable and can be cast as such. Will not work: IntWritable[] array = (IntWritable[]) writable.get(); Will work: for(Writable element : writable.get()) { IntWritable intWritable = (IntWritable)element; } Brock On Sat, Dec 10, 2011 at 3:58 PM, zanurag zanu...@live.com wrote: Hi Dhruv, Is this working well for you ?? Are you able to do IntWritable [] abc = array.get(); I am trying similar thing for IntTwoDArrayWritable. The array.set works but array.get returns Writable[][] and I am not able to cast it to IntWritable[][]. -- View this message in context: http://lucene.472066.n3.nabble.com/ArrayWritable-usage-tp3138520p3576386.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Question on Hadoop Streaming
Does you job end with an error? I am guessing what you want is: -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh' First option says use your script as a mapper and second says ship your script as part of the job. Brock On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler ro...@ormium.de wrote: Hi, I've got the following setup for NGS read alignment: A script accepting data from stdin/out: cat /root/bowtiestreaming.sh cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/ /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 - 2 /root/bowtie.log A file copied to HDFS: hadoop fs -put SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 A streaming job invoked with only the mapper: hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 -output SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 The file cannot be found even it is displayed: hadoop fs -cat /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned 11/12/06 09:07:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id cat: File does not exist: /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned He file looks like this (tab seperated): head SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 @SRR014475.1 :1:1:108:111 length=36 GAGACGTCGTCCTCAGTACATATA I3I+I(%BH43%III7I(5III*II+ @SRR014475.2 :1:1:112:26 length=36 GNNTTCCCCAACTTCCAAATCACCTAAC I!!II=I@II5II)/$;%+*/%%## @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACCCTGATGTAAATGGTA IAIIAII%I0G @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG IIICI+@5+)'(-';%$;+; @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT I0I:I'+IG3II46II0C@=III()+:+2$ @SRR014475.6 :1:1:106:14 length=36 GNNNTNTAGCATTAAGTAATTGGT I!!!I!I6I*+III:%IB0+I.%? @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT III0%%)%I.II;III.(I@E2*'+1;;#;' @SRR014475.8 :1:1:123:8 length=36 GNNNTTNN I!!!$(!! @SRR014475.9 :1:1:118:88 length=36 GGAAACTGGCGCGCTACCAGGTAACGCGCCAC IIIGIAA4;1+16*;*+)'$%#$% @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTACGAATAATA IICII;CGIDI?%$I:%6)C*;#; and the result like this: cat SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 |./bowtiestreaming.sh |head @SRR014475.3 :1:1:101:937 length=36 + gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACCCTGATGTAAATGGTA IAIIAII%I0G 0 7:TC,27:GT @SRR014475.4 :1:1:124:64 length=36 + gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG IIICI+@5+)'(-';%$;+; 0 30:TC @SRR014475.5 :1:1:108:897 length=36 + gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT I0I:I'+IG3II46II0C@=III()+:+2$ 0 5:CA,28:GT,29:CG,30:AT,34:CT @SRR014475.9 :1:1:118:88 length=36 - gi|110640213|ref|NC_008253.1| 3598410 GTGGCGCGTTACCTGGTAGCGCGCCAGTTTCC %$#%$')+*;*61+1;4AAIGIII 0 @SRR014475.15 :1:1:87:967 length=36 + gi|110640213|ref|NC_008253.1| 4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC A27II7CIII*I5I+F?II' 0 6:GA,26:GT @SRR014475.20 :1:1:108:121 length=36 - gi|110640213|ref|NC_008253.1| 37761 AATGCATATTGAGAGTGTGATTATTAGC ID4II'2IIIC/;B?FII 0 12:CT @SRR014475.23 :1:1:75:54 length=36 + gi|110640213|ref|NC_008253.1| 2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA CI;';29=9I.4%EE2)*' 0 @SRR014475.24 :1:1:89:904 length=36 - gi|110640213|ref|NC_008253.1| 3216193 ATTAGTGTTAAGATTTCTATATTGTTGAGGCC #%);%;$EI-;$%8%I%I/+III 0 18:CT,21:GT,30:CT,31:TG,34:AT @SRR014475.27 :1:1:74:887 length=36 - gi|110640213|ref|NC_008253.1| 540567
Re: hadoop-fuse unable to find java
Hi, This specific issue is probably more appropriate on the CDH-USER list. (BCC common-user) It looks like the JRE detection mechanism recently added to BIGTOP would have this same issue: https://issues.apache.org/jira/browse/BIGTOP-25 To resolve the immediate issue I would set an environment variable in /etc/default/hadoop-0.20 or haoop-env.sh. You could set it static to a particular version or perhaps use: export JAVA_HOME=$(readlink -f /usr/java/latest) Ultimately I think this will be fixed in BigTop but also may need to be fixed in CDH3. As such I have filed a JIRA for you: https://issues.cloudera.org/browse/DISTRO-349 If you are interested in seeing how the issue progresses you can Watch the issue and receive email updates. Cheers, Brock On Tue, Nov 29, 2011 at 1:11 PM, John Bond john.r.b...@gmail.com wrote: Still getting this using Hadoop 0.20.2-cdh3u2 On 5 September 2011 16:08, John Bond john.r.b...@gmail.com wrote: I have recently rebuilt a server with centos 6.0 and it seems that something caused hadoop-fuse to get confused and it is no longer able to find libjvm.so. The error i get is find: `/usr/lib/jvm/java-1.6.0-sun-1.6.0.14/jre//jre/lib': No such file or directory /usr/lib/hadoop-0.20/bin/fuse_dfs: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory A dirty look around suggests /usr/lib/hadoop-0.20/bin/hadoop-config.sh is setting JAVA_HOME to `/usr/lib/jvm/java-1.6.0-sun-1.6.0.14/jre/` /usr/bin/hadoop-fuse-dfs has the following which adds an extra /jre/ to the path for f in `find ${JAVA_HOME}/jre/lib -name client -prune -o -name libjvm.so -exec dirname {} \;`; do is there a need to specify the subfolder. I think it would make things simpler to just change the above to for f in `find ${JAVA_HOME} -name client -prune -o -name libjvm.so -exec dirname {} \;`; do The other option is to change /usr/lib/hadoop-0.20/bin/hadoop-config.sh so it sets the path without jre either remove ` /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \`. Or reorder the search list so /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/ \ is preferred regards John hadoop-fuse-dfs @@ -14,7 +14,7 @@ if [ ${LD_LIBRARY_PATH} = ]; then export LD_LIBRARY_PATH=/usr/lib - for f in `find ${JAVA_HOME} -name client -prune -o -name libjvm.so -exec dirname {} \;`; do + for f in `find ${JAVA_HOME}/jre/lib -name client -prune -o -name libjvm.so -exec dirname {} \;`; do export LD_LIBRARY_PATH=$f:${LD_LIBRARY_PATH} done fi hadoop-config.sh @@ -68,8 +68,8 @@ if [ -z $JAVA_HOME ]; then for candidate in \ /usr/lib/jvm/java-6-sun \ - /usr/lib/jvm/java-1.6.0-sun-1.6.0.* \ /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \ + /usr/lib/jvm/java-1.6.0-sun-1.6.0.* \ /usr/lib/j2sdk1.6-sun \ /usr/java/jdk1.6* \ /usr/java/jre1.6* \
Re: Hadoop Serialization: Avro
Hi, Depending on the response you get here, you might also post the question separately on avro-user. On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina lurb...@mit.edu wrote: Hey everyone, First time posting to the list. I'm currently writing a hadoop job that will run daily and whose output will be part of the part of the next day's input. Also, the output will potentially be read by other programs for later analysis. Since my program's output is used as part of the next day's input, it would be nice if it was stored in some binary format that is easy to read the next time around. But this format also needs to be readable by other outside programs, not necessarily written in Java. After searching for a while it seems that Avro is what I want to be using. In any case, I have been looking around for a while and I can't seem to find a single example of how to use Avro within a Hadoop job. It seems that in order to use Avro I need to change the io.serializations value, however I don't know which value should be specified. Furthermore, I found that there are classes Avro{Input,Output}Format but these use a series of other Avro classes which, as far as I understand, seem need the use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as far as I am concerned Avro* (with * replaced with pretty much any Hadoop class name). It seems however that these are used so that the Avro format is used throughout the Hadoop process to pass objects around. I just want to use Avro to save my output and read it again as input next time around. So far I have been using SequenceFile{Input,Output}Format, and have implemented the Writable interface in the relevant classes, however this is not portable to other languages. Is there a way to use Avro without a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu
Re: HDFS DataNode daily log growing really high and fast
Hi, On Mon, Oct 31, 2011 at 12:59 AM, Ronen Itkin ro...@taykey.com wrote: For instance, yesterday's daily log: /var/log/hadoop/hadoop-hadoop-datanode-ip-10-10-10-4.log on the problematic Node03 was in the size of 1.1 GB while on other Nodes the same log was in the size of 87 MB. Again, nothing is being run specifically on Node03, I have 3 nodes, with replication of 3 - means that all the data is being saved on every node, All nodes are connected to the same switch (and on the same subnet) - so no advantages to Node03 in any Job. I am being suspicious regarding HBase... Does that servers regionserver have more regions assigned to it? Check the HBase GUI. Also, you can turn that message off with: log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=WARN Brock
Re: Using KeyValueInputFormat as a Input format
Hi, On Sun, Oct 23, 2011 at 10:40 AM, Varun Thacker varunthacker1...@gmail.com wrote: I am having trouble using KeyValueInputFormat as a Input format. I used both hadoop 0.20.1 and 0.21.0 and get a error while using it. This seems to be because of this issue - https://issues.apache.org/jira/browse/MAPREDUCE-655which was resolved. I'm not sure why I am still get an error. This is how my code looks like- http://pastebin.com/fiBSygvP. The error is on line 12. It would probably be helpful to include the actual error message. With that said, you are probably mixing mapred and mapreduce packages. Make sure you imports are either mapred or mapreduce and never both. org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat Brock
Re: implementing comparable
Hi, Inline.. On Sun, Oct 16, 2011 at 9:40 PM, Keith Thompson kthom...@binghamton.eduwrote: Thanks. I went back and changed to WritableComparable instead of just Comparable. So, I added the readFields and write methods. I also took care of the typo in the constructor. :P Now I am getting this error: 11/10/16 21:34:08 INFO mapred.JobClient: Task Id : attempt_201110162105_0002_m_01_1, Status : FAILED java.lang.RuntimeException: java.lang.NoSuchMethodException: edu.bing.vfi5.KeyList.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java:84) at org.apache.hadoop.io.WritableComparator.init(WritableComparator.java:70) at org.apache.hadoop.io.WritableComparator.get(WritableComparator.java:44) at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:599) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:791) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.NoSuchMethodException: edu.bing.vfi5.KeyList.init() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109) Is it saying it can't find the constructor? Writables and by extension WritableComparables need a default Constructor. This makes logical sense. If hadoop is going to call the readFields() method, it needs a previously constructed object. Brock
Re: implementing comparable
Hi, Discussion, below. On Sat, Oct 15, 2011 at 4:26 PM, Keith Thompson kthom...@binghamton.eduwrote: Hello, I am trying to write my very first MapReduce code. When I try to run the jar, I get this error: 11/10/15 17:17:30 INFO mapred.JobClient: Task Id : attempt_201110151636_0003_m_01_2, Status : FAILED java.lang.ClassCastException: class edu.bing.vfi5.KeyList at java.lang.Class.asSubclass(Class.java:3018) at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:599) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:791) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) I assume this means that it has something to do with my implementation of comparable. KeyList is a class for a 3-tuple key. The code is listed below. Any hints would be greatly appreciated as I am trying to understand how comparable is supposed to work. Also, do I need to implement Writable as well? If so, should this be code for how the output is written to a file in HDFS? Thanks, Keith package edu.bing.vfi5; public class KeyList implements ComparableKeyList { Key's need to be WritableComparable. private int[] keys; public KeyList(int i, int j, int k) { keys = new int[3]; keys[0] = i; keys[0] = j; keys[0] = k; } @Override public int compareTo(KeyList k) { // TODO Auto-generated method stub if(this.keys[0] == k.keys[0] this.keys[1] == k.keys[1] this.keys[2] == k.keys[2]) return 0; else if((this.keys[0]k.keys[0]) ||(this.keys[0]==k.keys[0]this.keys[1]k.keys[1]) ||(this.keys[0]==k.keys[0]this.keys[1]==k.keys[1]this.keys[2]k.keys[2])) return 1; else return -1; } }
Re: problem while running wordcount on lion x
Hi, On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel jign...@websoft.com wrote: I also found another problem if I directly export from eclipse as a jar file then while trying javac -jar or hadoop -jar doesn't recognize that jar. However same jar works well with windows. Can you please share the error message? Note, the structure of the hadoop command is: hadoop jar file.jar class.name Note, no - in front of jar like `java -jar' Brock
Re: problem while running wordcount on lion x
Hi, Answers, inline. On Wed, Oct 5, 2011 at 7:31 PM, Jignesh Patel jign...@websoft.com wrote: have used eclipse to export the file and then got following error hadoop-user$ bin/hadoop jar wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount input output Exception in thread main java.io.IOException: Error opening job jar: wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:127) at java.util.jar.JarFile.init(JarFile.java:135) at java.util.jar.JarFile.init(JarFile.java:72) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) OK, the problem above is that you are missing a space, it should be: hadoop-user$ bin/hadoop jar wordcountsmp/wordcount.jar org.apache.hadoop.examples.WordCount input output with a space between the jar and the class name. I tried following java -jar xf wordcountsmp/wordcount.jar That's not how you extract a jar. It should be: jar tf wordcountsmp/wordcount.jar to get a listing of the jar and: jar xf wordcountsmp/wordcount.jar To extract it. and got the error Unable to access jar file xf my jar file size is 5kb. I am feeling somehow eclipse export in macOS is not creating appropriate jar. On Oct 5, 2011, at 8:16 PM, Brock Noland wrote: Hi, On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel jign...@websoft.com wrote: I also found another problem if I directly export from eclipse as a jar file then while trying javac -jar or hadoop -jar doesn't recognize that jar. However same jar works well with windows. Can you please share the error message? Note, the structure of the hadoop command is: hadoop jar file.jar class.name Note, no - in front of jar like `java -jar' Brock
Re: Outputformat and RecordWriter in Hadoop Pipes
Hi, On Tue, Sep 13, 2011 at 12:27 PM, Vivek K hadoop.v...@gmail.com wrote: Hi all, I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I have been able to successfully work with my own mappers and reducers, but now I need to generate output (from reducer) in a format different from the default TextOutputFormat. I have a few questions: (1) Similar to Hadoop streaming, is there an option to set OutputFormat in HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ? I am using Hadoop version 0.20.2. (2) For a simple test on how to use an in-built non-default writer, I tried the following: hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=false -input input.seq -output output -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer org.apache.hadoop.io.SequenceFile.Writer -program my_test_program -writer wants an outputformat: if (results.hasOption(writer)) { setIsJavaRecordWriter(job, true); job.setOutputFormat(getClass(results, writer, job, OutputFormat.class)); As such I think you want: -writer org.apache.hadoop.mapred.SequenceFileOutputFormat SequenceFile.Writer simply writes sequence files has nothing todo with MapReduce. This is also wrong: hadoop.pipes.java.recordwriter=false Brock
Re: old problem: mapper output as sequence file
Hi, On Mon, Sep 19, 2011 at 3:19 PM, Shi Yu sh...@uchicago.edu wrote: I am stuck again in a probably very simple problem. I couldn't generate the map output in sequence file format. I always get this error: java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable No worries. job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); You are running a map only job, so I think you want: job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); But I also recommend adding @Override on your map method because it's easy to accidentally not override your superclass method. @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ Brock
Re: Hadoop Streaming job Fails - Permission Denied error
Hi, This probably belongs on mapreduce-user as opposed to common-user. I have BCC'ed the common-user group. Generally it's a best practice to ship the scripts with the job. Like so: hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output -mapper WcStreamMap.py -reducer WcStreamReduce.py -file /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -file /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py Brock On Mon, Sep 12, 2011 at 4:18 AM, Bejoy KS bejoy.had...@gmail.com wrote: Hi I wanted to try out hadoop steaming and got the sample python code for mapper and reducer. I copied both into my lfs and tried running the steaming job as mention in the documentation. Here the command i used to run the job hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py Here other than input and output the rest all are on lfs locations. How ever the job is failing. The error log from the jobtracker url is as java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py: java.io.IOException: error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) ... 23 more Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.init(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 24 more On the error I checked the permissions of mapper and reducer. Issued a chmod 777 command as well. Still no luck. The permission of the files are as follows cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/ -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on pseudo distributed mode. Any help would be highly appreciated. Thank You Regards Bejoy.K.S
Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?
Hi, On Tue, Sep 6, 2011 at 9:29 AM, Ralf Heyde ralf.he...@gmx.de wrote: Hello, I have found a HDFSClient which shows me, how to access my HDFS from inside the cluster (i.e. running on a Node). My Idea is, that different processes may write 64M Chunks to HDFS from external Sources/Clients. Is that possible? Yes, the same HDFSClient code you have above should work outside the cluster, you just need core-site.xml and hdfs-site.xml in your classpath so client knows where the namenode is and what the block size should be. Brock
Re: Creating a hive table for a custom log
Hi, On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm trying to create a table similar to apache_log but I'm trying to avoid to write my own map-reduce task because I don't want to have my HDFS files twice. So if you're working with log lines like this: 186.92.134.151 [31/Aug/2011:00:10:41 +] GET /client/action1/?transaction_id=8002user_id=87179311248ts=1314749223525item1=271item2=6045environment=2 HTTP/1.1 112.201.65.238 [31/Aug/2011:00:10:41 +] GET /client/action1/?transaction_id=9002ts=1314749223525user_id=9048871793100item2=6045item1=271environment=2 HTTP/1.1 90.45.198.251 [31/Aug/2011:00:10:41 +] GET /client/action2/?transaction_id=9022ts=1314749223525user_id=9048871793100item2=6045item1=271environment=2 HTTP/1.1 And having in mind that the parameters could be in different orders. Which will be the best strategy to create this table? Write my own org.apache.hadoop.hive.contrib.serde2? Is there any resource already implemented that I could use to perform this task? I would use the regex serde to parse them: CREATE EXTERNAL TABLE access_log (ip STRING, dt STRING, request STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = ([\\d.]+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \(.+?)\) LOCATION '/path/to/file'; That will parse the three fields out and could be modified to separate out the action. Then I think you will need to parse the query string in Hive itself. In the end the objective is convert all the parameters in fields and use as type the action. With this big table I will be able to perform my queries, my joins or my views. Any ideas? Thanks in Advance, Raimon Bosch. -- View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html Sent from the Hadoop core-user mailing list archive at Nabble.com.