Re: Can't construct instance of class org.apache.hadoop.conf.Configuration

2012-04-30 Thread Brock Noland
Hi,

I would try this:

export CLASSPATH=$(hadoop classpath)

Brock

On Mon, Apr 30, 2012 at 10:15 AM, Ryan Cole r...@rycole.com wrote:
 Hello,

 I'm trying to run an application, written in C++, that uses libhdfs. I have
 compiled the code and get an error when I attempt to run the application.
 The error that I am getting is as follows: Can't construct instance of
 class org.apache.hadoop.conf.Configuration.

 Initially, I was receiving an error saying that CLASSPATH was not set. That
 was easy, so I set CLASSPATH to include the following three directories, in
 this order:


   1. $HADOOP_HOME
   2. $HADOOP_HOME/lib
   3. $HADOOP_HOME/conf

 The CLASSPATH not set error went away, and now I receive the error about
 the Configuration class. I'm assuming that I do not have something on the
 path that I need to, but everything I have read says to simply include
 these three directories.

 Does anybody have any idea what I might be missing? Full exception pasted
 below.

 Thanks,
 Ryan

 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/conf/Configuration
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.conf.Configuration
         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 Can't construct instance of class org.apache.hadoop.conf.Configuration
 node: /home/ryan/.node-gyp/0.7.8/src/node_object_wrap.h:61: void
 node::ObjectWrap::Wrap(v8::Handlev8::Object): Assertion
 `handle_.IsEmpty()' failed.
 Aborted (core dumped)



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


[ANNOUNCE] Apache MRUnit 0.8.0-incubating released

2012-02-02 Thread Brock Noland
The Apache MRUnit team is pleased to announce the release of MRUnit
0.8.0-incubating from the Apache Incubator.

This is the second release of Apache MRUnit, a Java library that helps
developers unit test Apache Hadoop map reduce jobs.

The release is available here:
http://www.apache.org/dyn/closer.cgi/incubator/mrunit/

The full change log is available here:
https://issues.apache.org/jira/browse/MRUNIT/fixforversion/12316359

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at

http://incubator.apache.org/mrunit/

The Apache MRUnit Team


Re: race condition in hadoop 0.20.2 (cdh3u1)

2012-01-17 Thread Brock Noland
Hi,

tl;dr DUMMY should not be static.

On Tue, Jan 17, 2012 at 3:21 PM, Stan Rosenberg
srosenb...@proclivitysystems.com wrote:


 class MyKeyT implements WritableComparableT {
  private String ip; // first part of the key
  private final static Text DUMMY = new Text();
  ...

  public void write(DataOutput out) throws IOException {
     // serialize the first part of the key
     DUMMY.set(ip);
     DUMMY.write(out);
     ...
  }

  public void readFields(DataInput in) throws IOException {
    // de-serialize the first part of the key
    DUMMY.readFields(in); ip = DUMMY.toString();
    
  }
 }

This class is invalid. A single thread will be executing your mapper
or reducer but there will be multiple threads (background threads such
as the SpillThread) creating MyKey instances which is exactly what you
are seeing. This is by design.

Brock


Re: desperate question about NameNode startup sequence

2011-12-17 Thread Brock Noland
Hi,

Since your using CDH2, I am moving this to CDH-USER. You can subscribe here:

http://groups.google.com/a/cloudera.org/group/cdh-user

BCC'd common-user

On Sat, Dec 17, 2011 at 2:01 AM, Meng Mao meng...@gmail.com wrote:
 Maybe this is a bad sign -- the edits.new was created before the master
 node crashed, and is huge:

 -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
 total 41G
 -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
 -rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
 -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
 -rw-r--r-- 1 hadoop hadoop    8 Jan 27  2011 fstime
 -rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION

 could this mean something was up with our SecondaryNameNode and rolling the
 edits file?

Yes it looks like a checkpoint never completed. It's a good idea to
monitor the mtime on fsimage to ensure it never gets too old.

Has a checkpoint completed since you restarted?

Brock


Re: ArrayWritable usage

2011-12-13 Thread Brock Noland
Hi,

ArrayWritable is a touch hard to use. Say you have an array of
IntWritable[]. The get() method or ArrayWritable, after
serializations/deserialization, does in fact return an array of type
Writable. As such you cannot cast it directly to IntWritable[]. Individual
elements are of type IntWritable and can be cast as such.

Will not work:

IntWritable[] array = (IntWritable[]) writable.get();

Will work:

for(Writable element : writable.get()) {
  IntWritable intWritable = (IntWritable)element;
}

Brock

On Sat, Dec 10, 2011 at 3:58 PM, zanurag zanu...@live.com wrote:

 Hi Dhruv,
 Is this working well for you ?? Are you able to do IntWritable [] abc =
 array.get();

 I am trying similar thing for IntTwoDArrayWritable.
 The array.set works but array.get returns Writable[][] and I am not able
 to cast it to IntWritable[][].

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ArrayWritable-usage-tp3138520p3576386.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: Question on Hadoop Streaming

2011-12-06 Thread Brock Noland
Does you job end with an error?

I am guessing what you want is:

-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'

First option says use your script as a mapper and second says ship
your script as part of the job.

Brock

On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler ro...@ormium.de wrote:
 Hi,

 I've got the following setup for NGS read alignment:


 A script accepting data from stdin/out:
 
 cat /root/bowtiestreaming.sh
 cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
 /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
 2 /root/bowtie.log



 A file copied to HDFS:
 
 hadoop fs -put
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1

 A streaming job invoked with only the mapper:
 
 hadoop jar
 hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1
 -output
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned
 -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0

 The file cannot be found even it is displayed:
 
 hadoop fs -cat
 /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned
 11/12/06 09:07:47 INFO security.Groups: Group mapping
 impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
 cacheTimeout=30
 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
 Instead, use mapreduce.task.attempt.id
 cat: File does not exist:
 /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1.aligned


 He file looks like this (tab seperated):
 head
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1
 @SRR014475.1 :1:1:108:111 length=36     GAGACGTCGTCCTCAGTACATATA
    I3I+I(%BH43%III7I(5III*II+
 @SRR014475.2 :1:1:112:26 length=36      GNNTTCCCCAACTTCCAAATCACCTAAC
    I!!II=I@II5II)/$;%+*/%%##
 @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACCCTGATGTAAATGGTA
    IAIIAII%I0G
 @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
    IIICI+@5+)'(-';%$;+;
 @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
    I0I:I'+IG3II46II0C@=III()+:+2$
 @SRR014475.6 :1:1:106:14 length=36      GNNNTNTAGCATTAAGTAATTGGT
    I!!!I!I6I*+III:%IB0+I.%?
 @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
    III0%%)%I.II;III.(I@E2*'+1;;#;'
 @SRR014475.8 :1:1:123:8 length=36       GNNNTTNN
    I!!!$(!!
 @SRR014475.9 :1:1:118:88 length=36      GGAAACTGGCGCGCTACCAGGTAACGCGCCAC
    IIIGIAA4;1+16*;*+)'$%#$%
 @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTACGAATAATA
    IICII;CGIDI?%$I:%6)C*;#;


 and the result like this:

 cat
 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1
 |./bowtiestreaming.sh |head
 @SRR014475.3 :1:1:101:937 length=36     +
 gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACCCTGATGTAAATGGTA
    IAIIAII%I0G  0       7:TC,27:GT
 @SRR014475.4 :1:1:124:64 length=36      +
 gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
    IIICI+@5+)'(-';%$;+;  0       30:TC
 @SRR014475.5 :1:1:108:897 length=36     +
 gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
    I0I:I'+IG3II46II0C@=III()+:+2$  0
 5:CA,28:GT,29:CG,30:AT,34:CT
 @SRR014475.9 :1:1:118:88 length=36      -
 gi|110640213|ref|NC_008253.1|   3598410 GTGGCGCGTTACCTGGTAGCGCGCCAGTTTCC
    %$#%$')+*;*61+1;4AAIGIII  0
 @SRR014475.15 :1:1:87:967 length=36     +
 gi|110640213|ref|NC_008253.1|   4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
    A27II7CIII*I5I+F?II'  0       6:GA,26:GT
 @SRR014475.20 :1:1:108:121 length=36    -
 gi|110640213|ref|NC_008253.1|   37761   AATGCATATTGAGAGTGTGATTATTAGC
    ID4II'2IIIC/;B?FII  0       12:CT
 @SRR014475.23 :1:1:75:54 length=36      +
 gi|110640213|ref|NC_008253.1|   2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
    CI;';29=9I.4%EE2)*'  0
 @SRR014475.24 :1:1:89:904 length=36     -
 gi|110640213|ref|NC_008253.1|   3216193 ATTAGTGTTAAGATTTCTATATTGTTGAGGCC
    #%);%;$EI-;$%8%I%I/+III  0
 18:CT,21:GT,30:CT,31:TG,34:AT
 @SRR014475.27 :1:1:74:887 length=36     -
 gi|110640213|ref|NC_008253.1|   540567  

Re: hadoop-fuse unable to find java

2011-11-29 Thread Brock Noland
Hi,

This specific issue is probably more appropriate on the CDH-USER list.
(BCC common-user) It looks like the JRE detection mechanism recently
added to BIGTOP would have this same issue:
https://issues.apache.org/jira/browse/BIGTOP-25

To resolve the immediate issue I would set an environment variable in
/etc/default/hadoop-0.20 or haoop-env.sh. You could set it static to a
particular version or perhaps use:
export JAVA_HOME=$(readlink -f /usr/java/latest)

Ultimately I think this will be fixed in BigTop but also may need to
be fixed in CDH3. As such I have filed a JIRA for you:

https://issues.cloudera.org/browse/DISTRO-349

If you are interested in seeing how the issue progresses you can
Watch the issue and receive email updates.

Cheers,
Brock

On Tue, Nov 29, 2011 at 1:11 PM, John Bond john.r.b...@gmail.com wrote:
 Still getting this using

 Hadoop 0.20.2-cdh3u2



 On 5 September 2011 16:08, John Bond john.r.b...@gmail.com wrote:
 I have recently rebuilt a server with centos 6.0 and it seems that
 something caused hadoop-fuse to get confused and it is no longer able
 to find libjvm.so.  The error i get is

 find: `/usr/lib/jvm/java-1.6.0-sun-1.6.0.14/jre//jre/lib': No such
 file or directory
 /usr/lib/hadoop-0.20/bin/fuse_dfs: error while loading shared
 libraries: libjvm.so: cannot open shared object file: No such file or
 directory

 A dirty look around suggests /usr/lib/hadoop-0.20/bin/hadoop-config.sh
 is setting  JAVA_HOME to `/usr/lib/jvm/java-1.6.0-sun-1.6.0.14/jre/`

 /usr/bin/hadoop-fuse-dfs has the following which adds an extra /jre/
 to the path

  for f in `find ${JAVA_HOME}/jre/lib -name client -prune -o -name
 libjvm.so -exec dirname {} \;`; do

 is there a need to specify the subfolder.  I think it would make
 things simpler to just change the above to

  for f in `find ${JAVA_HOME} -name client -prune -o -name libjvm.so
 -exec dirname {} \;`; do


 The other option is to change
 /usr/lib/hadoop-0.20/bin/hadoop-config.sh so it sets the path without
 jre either remove ` /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \`.  Or
 reorder the search list so     /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/ \
 is preferred

 regards
 John

 hadoop-fuse-dfs
 @@ -14,7 +14,7 @@

  if [ ${LD_LIBRARY_PATH} =  ]; then
   export LD_LIBRARY_PATH=/usr/lib
 -  for f in `find ${JAVA_HOME} -name client -prune -o -name libjvm.so
 -exec dirname {} \;`; do
 +  for f in `find ${JAVA_HOME}/jre/lib -name client -prune -o -name
 libjvm.so -exec dirname {} \;`; do
     export LD_LIBRARY_PATH=$f:${LD_LIBRARY_PATH}
   done
  fi

 hadoop-config.sh
 @@ -68,8 +68,8 @@
  if [ -z $JAVA_HOME ]; then
   for candidate in \
     /usr/lib/jvm/java-6-sun \
 -    /usr/lib/jvm/java-1.6.0-sun-1.6.0.* \
     /usr/lib/jvm/java-1.6.0-sun-1.6.0.*/jre/ \
 +    /usr/lib/jvm/java-1.6.0-sun-1.6.0.* \
     /usr/lib/j2sdk1.6-sun \
     /usr/java/jdk1.6* \
     /usr/java/jre1.6* \



Re: Hadoop Serialization: Avro

2011-11-26 Thread Brock Noland
Hi,

Depending on the response you get here, you might also post the
question separately on avro-user.

On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina lurb...@mit.edu wrote:
 Hey everyone,

 First time posting to the list. I'm currently writing a hadoop job that
 will run daily and whose output will be part of the part of the next day's
 input. Also, the output will potentially be read by other programs for
 later analysis.

 Since my program's output is used as part of the next day's input, it would
 be nice if it was stored in some binary format that is easy to read the
 next time around. But this format also needs to be readable by other
 outside programs, not necessarily written in Java. After searching for a
 while it seems that Avro is what I want to be using. In any case, I have
 been looking around for a while and I can't seem to find a single example
 of how to use Avro within a Hadoop job.

 It seems that in order to use Avro I need to change the io.serializations
 value, however I don't know which value should be specified. Furthermore, I
 found that there are classes Avro{Input,Output}Format but these use a
 series of other Avro classes which, as far as I understand, seem need the
 use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
 far as I am concerned Avro* (with * replaced with pretty much any Hadoop
 class name). It seems however that these are used so that the Avro format
 is used throughout the Hadoop process to pass objects around.

 I just want to use Avro to save my output and read it again as input next
 time around. So far I have been using SequenceFile{Input,Output}Format, and
 have implemented the Writable interface in the relevant classes, however
 this is not portable to other languages. Is there a way to use Avro without
 a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
 advance,

 Best,
 -Leo

 --
 Leo Urbina
 Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
 Department of Mathematics
 lurb...@mit.edu



Re: HDFS DataNode daily log growing really high and fast

2011-10-31 Thread Brock Noland
Hi,

On Mon, Oct 31, 2011 at 12:59 AM, Ronen Itkin ro...@taykey.com wrote:
 For instance, yesterday's daily log:
 /var/log/hadoop/hadoop-hadoop-datanode-ip-10-10-10-4.log
 on the problematic Node03 was in the size of 1.1 GB while on other Nodes
 the same log was in the size of 87 MB.

 Again, nothing is being run specifically on Node03, I have 3 nodes, with
 replication of 3 - means that all the data is being saved on every node,
 All nodes are connected to the same switch (and on the same subnet) - so no
 advantages to Node03 in any Job.

 I am being suspicious regarding HBase...


Does that servers regionserver have more regions assigned to it?
Check the HBase GUI.

Also, you can turn that message off with:

log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=WARN


Brock


Re: Using KeyValueInputFormat as a Input format

2011-10-25 Thread Brock Noland
Hi,

On Sun, Oct 23, 2011 at 10:40 AM, Varun Thacker
varunthacker1...@gmail.com wrote:

 I am having trouble using KeyValueInputFormat as a Input format. I used both
 hadoop 0.20.1 and 0.21.0 and get a error while using it. This seems to be
 because of this issue -
 https://issues.apache.org/jira/browse/MAPREDUCE-655which was resolved.
 I'm not sure why I am still get an error. This is how my
 code looks like- http://pastebin.com/fiBSygvP. The error is on line 12.

It would probably be helpful to include the actual error message. With
that said, you are probably mixing mapred and mapreduce packages. Make
sure you imports are either mapred or mapreduce and never both.

org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat

Brock


Re: implementing comparable

2011-10-16 Thread Brock Noland
Hi,

Inline..

On Sun, Oct 16, 2011 at 9:40 PM, Keith Thompson kthom...@binghamton.eduwrote:

 Thanks.  I went back and changed to WritableComparable instead of just
 Comparable.  So, I added the readFields and write methods.   I also took
 care of the typo in the constructor. :P

 Now I am getting this error:

 11/10/16 21:34:08 INFO mapred.JobClient: Task Id :
 attempt_201110162105_0002_m_01_1, Status : FAILED
 java.lang.RuntimeException: java.lang.NoSuchMethodException:
 edu.bing.vfi5.KeyList.init()
 at

 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
 at
 org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java:84)
 at
 org.apache.hadoop.io.WritableComparator.init(WritableComparator.java:70)
 at org.apache.hadoop.io.WritableComparator.get(WritableComparator.java:44)
 at
 org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:599)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:791)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.NoSuchMethodException: edu.bing.vfi5.KeyList.init()
 at java.lang.Class.getConstructor0(Class.java:2706)
 at java.lang.Class.getDeclaredConstructor(Class.java:1985)
 at

 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)

 Is it saying it can't find the constructor?


Writables and by extension WritableComparables need a default Constructor.
This makes logical sense. If hadoop is going to call the readFields()
method, it needs  a previously constructed object.

Brock


Re: implementing comparable

2011-10-15 Thread Brock Noland
Hi,

Discussion, below.

On Sat, Oct 15, 2011 at 4:26 PM, Keith Thompson kthom...@binghamton.eduwrote:

 Hello,
 I am trying to write my very first MapReduce code.  When I try to run the
 jar, I get this error:

 11/10/15 17:17:30 INFO mapred.JobClient: Task Id :
 attempt_201110151636_0003_m_01_2, Status : FAILED
 java.lang.ClassCastException: class edu.bing.vfi5.KeyList
 at java.lang.Class.asSubclass(Class.java:3018)
 at
 org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:599)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:791)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

 I assume this means that it has something to do with my implementation of
 comparable.  KeyList is a class for a 3-tuple key.  The code is listed
 below.  Any hints would be greatly appreciated as I am trying to understand
 how comparable is supposed to work.  Also, do I need to implement Writable
 as well?  If so, should this be code for how the output is written to a
 file
 in HDFS?

 Thanks,
 Keith

 package edu.bing.vfi5;

 public class KeyList implements ComparableKeyList {


Key's need to be WritableComparable.



 private int[] keys;
  public KeyList(int i, int j, int k) {
 keys = new int[3];
 keys[0] = i;
 keys[0] = j;
 keys[0] = k;
 }

 @Override
 public int compareTo(KeyList k) {
 // TODO Auto-generated method stub
 if(this.keys[0] == k.keys[0]  this.keys[1] == k.keys[1]  this.keys[2]
 ==
 k.keys[2])
 return 0;
 else if((this.keys[0]k.keys[0])
 ||(this.keys[0]==k.keys[0]this.keys[1]k.keys[1])

 ||(this.keys[0]==k.keys[0]this.keys[1]==k.keys[1]this.keys[2]k.keys[2]))
 return 1;
 else
 return -1;
 }
 }



Re: problem while running wordcount on lion x

2011-10-05 Thread Brock Noland
Hi,

On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel jign...@websoft.com wrote:


 I also found another problem if I directly export from eclipse as a jar
 file then while trying javac -jar or hadoop -jar doesn't recognize that jar.
 However same jar works well with windows.



Can you please share the error message? Note, the structure of the hadoop
command is:

hadoop jar file.jar class.name

Note, no - in front of jar like `java -jar'

Brock


Re: problem while running wordcount on lion x

2011-10-05 Thread Brock Noland
Hi,

Answers, inline.

On Wed, Oct 5, 2011 at 7:31 PM, Jignesh Patel jign...@websoft.com wrote:

  have used eclipse to export the file and then got following error

 hadoop-user$ bin/hadoop jar
 wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount input output


 Exception in thread main java.io.IOException: Error opening job jar:
 wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:127)
at java.util.jar.JarFile.init(JarFile.java:135)
at java.util.jar.JarFile.init(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)


OK, the problem above is that you are missing a space, it should be:

hadoop-user$ bin/hadoop jar wordcountsmp/wordcount.jar
org.apache.hadoop.examples.WordCount input output

with a space between the jar and the class name.


 I tried following
 java -jar xf wordcountsmp/wordcount.jar


That's not how you extract a jar. It should be:

jar tf wordcountsmp/wordcount.jar

to get a listing of the jar and:

jar xf wordcountsmp/wordcount.jar

To extract it.


 and got the error

 Unable to access jar file xf

 my jar file size is 5kb. I am feeling somehow eclipse export in macOS is
 not creating appropriate jar.




 On Oct 5, 2011, at 8:16 PM, Brock Noland wrote:

  Hi,
 
  On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel jign...@websoft.com
 wrote:
 
 
  I also found another problem if I directly export from eclipse as a jar
  file then while trying javac -jar or hadoop -jar doesn't recognize that
 jar.
  However same jar works well with windows.
 
 
 
  Can you please share the error message? Note, the structure of the hadoop
  command is:
 
  hadoop jar file.jar class.name
 
  Note, no - in front of jar like `java -jar'
 
  Brock




Re: Outputformat and RecordWriter in Hadoop Pipes

2011-09-20 Thread Brock Noland
Hi,

On Tue, Sep 13, 2011 at 12:27 PM, Vivek K hadoop.v...@gmail.com wrote:
 Hi all,

 I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
 have been able to successfully work with my own mappers and reducers, but
 now I need to generate output (from reducer) in a format different from the
 default TextOutputFormat. I have a few questions:

 (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
 HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
 I am using Hadoop version 0.20.2.

 (2) For a simple test on how to use an in-built non-default writer, I tried
 the following:

     hadoop pipes -D hadoop.pipes.java.recordreader=true -D
 hadoop.pipes.java.recordwriter=false -input input.seq -output output
 -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
 org.apache.hadoop.io.SequenceFile.Writer -program my_test_program


-writer wants an outputformat:

  if (results.hasOption(writer)) {
setIsJavaRecordWriter(job, true);
job.setOutputFormat(getClass(results, writer, job,
  OutputFormat.class));



As such I think you want:

-writer org.apache.hadoop.mapred.SequenceFileOutputFormat

SequenceFile.Writer simply writes sequence files has nothing todo with
MapReduce.

This is also wrong:

hadoop.pipes.java.recordwriter=false

Brock


Re: old problem: mapper output as sequence file

2011-09-19 Thread Brock Noland
Hi,

On Mon, Sep 19, 2011 at 3:19 PM, Shi Yu sh...@uchicago.edu wrote:

 I am stuck again in a probably very simple problem.  I couldn't generate the
 map output in sequence file format.  I always get this error:
 java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class 
 org.apache.hadoop.io.LongWritable


No worries.

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

You are running a map only job, so I think you want:

   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(Text.class);

But I also recommend adding @Override on your map method because it's
easy to accidentally not override your superclass method.

 @Override
  public void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException{


Brock


Re: Hadoop Streaming job Fails - Permission Denied error

2011-09-14 Thread Brock Noland
Hi,

This probably belongs on mapreduce-user as opposed to common-user. I
have BCC'ed the common-user group.

Generally it's a best practice to ship the scripts with the job. Like so:

hadoop  jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
-mapper WcStreamMap.py  -reducer WcStreamReduce.py
-file /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py
-file /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

Brock

On Mon, Sep 12, 2011 at 4:18 AM, Bejoy KS bejoy.had...@gmail.com wrote:
 Hi
      I wanted to try out hadoop steaming and got the sample python code for
 mapper and reducer. I copied both into my lfs and tried running the steaming
 job as mention in the documentation.
 Here the command i used to run the job

 hadoop  jar
 /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
 -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
 -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
 /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

 Here other than input and output the rest all are on lfs locations. How ever
 the job is failing. The error log from the jobtracker url is as

 java.lang.RuntimeException: Error in configuring object
    at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
    at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
 Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
 Caused by: java.io.IOException: Cannot run program
 /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py: java.io.IOException:
 error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
 Caused by: java.io.IOException: java.io.IOException: error=13, Permission
 denied
    at java.lang.UNIXProcess.init(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

 On the error I checked the permissions of mapper and reducer. Issued a chmod
 777 command as well. Still no luck.

 The permission of the files are as follows
 cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
 -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
 -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

 I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
 pseudo distributed mode. Any help would be highly appreciated.

 Thank You

 Regards
 Bejoy.K.S



Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?

2011-09-05 Thread Brock Noland
Hi,

On Tue, Sep 6, 2011 at 9:29 AM, Ralf Heyde ralf.he...@gmx.de wrote:
 Hello,



 I have found a HDFSClient which shows me, how to access my HDFS from inside
 the cluster (i.e. running on a Node).



 My Idea is, that different processes may write 64M Chunks to HDFS from
 external Sources/Clients.

 Is that possible?

Yes, the same HDFSClient code you have above should work outside the
cluster, you just need core-site.xml and hdfs-site.xml in your
classpath so client knows where the namenode is and what the block
size should be.

Brock


Re: Creating a hive table for a custom log

2011-09-01 Thread Brock Noland
Hi,

On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch raimon.bo...@gmail.com wrote:

 Hi,

 I'm trying to create a table similar to apache_log but I'm trying to avoid
 to write my own map-reduce task because I don't want to have my HDFS files
 twice.

 So if you're working with log lines like this:

 186.92.134.151 [31/Aug/2011:00:10:41 +] GET
 /client/action1/?transaction_id=8002user_id=87179311248ts=1314749223525item1=271item2=6045environment=2
 HTTP/1.1

 112.201.65.238 [31/Aug/2011:00:10:41 +] GET
 /client/action1/?transaction_id=9002ts=1314749223525user_id=9048871793100item2=6045item1=271environment=2
 HTTP/1.1

 90.45.198.251 [31/Aug/2011:00:10:41 +] GET
 /client/action2/?transaction_id=9022ts=1314749223525user_id=9048871793100item2=6045item1=271environment=2
 HTTP/1.1

 And having in mind that the parameters could be in different orders. Which
 will be the best strategy to create this table? Write my own
 org.apache.hadoop.hive.contrib.serde2? Is there any resource already
 implemented that I could use to perform this task?

I would use the regex serde to parse them:

CREATE EXTERNAL
TABLE access_log
(ip STRING,
dt STRING,
request STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (input.regex = ([\\d.]+)
\\[([\\w:/]+\\s[+\\-]\\d{4})\\] \(.+?)\)
LOCATION '/path/to/file';

That will parse the three fields out and could be modified to separate
out the action. Then I think you will need to parse the query string
in Hive itself.


 In the end the objective is convert all the parameters in fields and use as
 type the action. With this big table I will be able to perform my queries,
 my joins or my views.

 Any ideas?

 Thanks in Advance,
 Raimon Bosch.
 --
 View this message in context: 
 http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.