setInt getInt

2011-10-04 Thread Ratner, Alan S (IS)
I have no problem with Hadoop.mapred using JobConf to setInt integers and pass 
them to my map(s) for getInt as shown in the first program below.  However, 
when I use Hadoop.mapreduce using Configuration to setInt these values are 
invisible to my map's getInt's.  Please tell me what I am doing wrong.  Thanks.

Both programs expect to see a file with a line or two of text in a directory 
named testIn.

Alan Ratner


This program uses JobConf and setInt/getInt and works fine.  It outputs:
number = 12345 (from map)


package cbTest;

import java.io.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class ConfTest extends Configured implements Tool {

@SuppressWarnings(deprecation)
public static class MapClass extends MapReduceBase implements 
MapperLongWritable, Text, Text, Text {
public static int number;

public void configure(JobConf job) {
number = job.getInt(number,-999);
}

public void map(LongWritable key, Text t, 
OutputCollectorText, Text output,
Reporter reporter) throws 
IOException {
System.out.println(number =  + number);
}
}

@SuppressWarnings(deprecation)
public int run(String[] args) throws Exception {
Path InputDirectory = new Path(testIn);
Path OutputDirectory = new Path(testOut);
System.out.println( Running ConfTest Program);
JobConf conf = new JobConf(getConf(), ConfTest.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setInt(number, 12345);
FileInputFormat.setInputPaths(conf, InputDirectory);
FileOutputFormat.setOutputPath(conf, OutputDirectory);
FileSystem fs = OutputDirectory.getFileSystem(conf);
fs.delete(OutputDirectory, true); //remove output of 
prior run
JobClient.runJob(conf);
return 0;
}

public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new 
ConfTest(), args);
System.exit(res);
}
}


This program uses Configuration and setInt/getInt.  Butt getInt in neither map 
or map:configure works.  It outputs:
 Passing integer 12345 in configuration  (from run)
map numbers: -999 -1 (from map as intMapConf and intConfConf)


package cbTest;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Conf2Test extends Configured implements Tool
{
public static class MapClass extends MapperLongWritable, Text, 
Text, Text
{
public int intConfConf = -1;

public void configure(Configuration job) {
intConfConf = job.getInt(number, -2);
}

public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException
{
int intMapConf = 
context.getConfiguration().getInt(number, -999);
System.out.println(map numbers:  + intMapConf+ 
+intConfConf);
}
}

public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(), new Conf2Test(), args);
System.exit(res);

}

public int run(String[] arg0) throws Exception {
Path Input_Directory = new Path(testIn);
Path Output_Directory = new Path(testOut);

Configuration conf = new Configuration();
Job job = new Job(conf, 
Conf2Test.class.getSimpleName());
job.setJarByClass(Conf2Test.class);
job.setMapperClass(MapClass.class);

Version Mismatch

2011-08-18 Thread Ratner, Alan S (IS)
We have a version mismatch problem which may be Hadoop related but may be due 
to a third party product we are using that requires us to run Zookeeper and 
Hadoop.  This product is rumored to soon be an Apache incubator project.  As I 
am not sure what I can reveal about this third party program prior to its 
release to Apache I will refer to it as XXX.

We are running Hadoop 0.20.203.0.  We have no problems running Hadoop at all.  
It runs our Hadoop programs and our Hadoop fs commands without any version 
mismatch complaints.  Localhost:50030 and 50070 both report we are running 
0.20.203.0, r1099333.

But when we try to initialize XXX we get 
org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 60, 
server = 61)
org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 60, 
server = 61).  The developers of XXX tell me that this error is coming from 
HDFS and is unrelated to their program.  (XXX does not include any Hadoop or 
Zookeeper jar files - as HBase does - but simply grabs these from HADOOP_HOME 
which points to our 0.20.203.0 installation and ZOOKEEPER_HOME.)


1.What exactly does client = 60 mean?  Which Hadoop version is this 
referring to?

2.What exactly does server = 61 mean?  Which Hadoop version is this 
referring to?

3.Any ideas on whether this is a problem with my Hadoop configuration or 
whether this is a problem with XXX?


17 15:20:56,564 [security.Groups] INFO : Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
17 15:20:56,704 [conf.Configuration] WARN : mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
17 15:20:56,771 [util.Initialize] FATAL: 
org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 60, 
server = 61)
org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 60, 
server = 61)
 at 
org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:231)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
 at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:156)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:255)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:222)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1734)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:74)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1768)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1750)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:131)

Alan



RE: Problem running a Hadoop program with external libraries

2011-03-08 Thread Ratner, Alan S (IS)
Thanks to all who suggested solutions to our problem of running a Java MR job 
using both external Java and C++ libraries.

We got it to work by moving all our .so files into an archive 
(http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html) 
and publishing it to our MR app with a single statement:
DistributedCache.createSymlink(conf).  

We found that we had to use Eclipse to generate a runnable jar file in 
extract mode; running an ordinary jar did not work.  (We tried putting our 
external jars in the archive file but a plain jar still did not work - perhaps 
I haven't assembled the complete set of jars into the archive.)

We had tried putting all the libraries directly in HDFS with a pointer in 
mapred-site.xml:
propertynamemapred.child.env/namevalueLD_LIBRARY_PATH=/user/ngc/lib/value/property
as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this did 
not work for us.

The bottom line of all this is that we managed to write a Hadoop job in Java 
that invokes the OpenCV (Open Computer Vision) C++ libraries 
(http://opencv.willowgarage.com/wiki/) using the JavaCV Java wrapper 
(http://code.google.com/p/javacv/).  OpenCV includes over 500 image processing 
algorithms.




-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Friday, March 04, 2011 3:53 PM
To: common-user@hadoop.apache.org
Subject: EXT :Problem running a Hadoop program with external libraries

We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the run as Java application button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then exported from Eclipse a runnable jar which extracted 
required libraries into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory.  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:

  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  property
namemapred.child.env/name

valueLD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64/value
  /property
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS

 Running Face Program
11/03/04 12:44:10 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: No job jar file set.  User 
classes may not be found. See Job or Job#setJar(String).
11/03/04 12:44:10 INFO mapred.FileInputFormat: Total input paths to process : 1

RE: Problem running a Hadoop program with external libraries

2011-03-08 Thread Ratner, Alan S (IS)
One other thing: We were getting out-of-memory errors with these external 
libraries and we had to reduce the value of   
namemapred.child.java.opts/name found in mapred-site.xml.  We had 
originally been using 2 GB (our servers have 24-48 GB RAM) and eliminated the 
out-of-memory errors by reducing this value to 1.28 GB.

Alan


-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Tuesday, March 08, 2011 4:22 PM
To: common-user@hadoop.apache.org
Cc: Gerlach, Hannah L (IS); Andrew Levine
Subject: EXT :RE: Problem running a Hadoop program with external libraries

Thanks to all who suggested solutions to our problem of running a Java MR job 
using both external Java and C++ libraries.

We got it to work by moving all our .so files into an archive 
(http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html) 
and publishing it to our MR app with a single statement:
DistributedCache.createSymlink(conf).  

We found that we had to use Eclipse to generate a runnable jar file in 
extract mode; running an ordinary jar did not work.  (We tried putting our 
external jars in the archive file but a plain jar still did not work - perhaps 
I haven't assembled the complete set of jars into the archive.)

We had tried putting all the libraries directly in HDFS with a pointer in 
mapred-site.xml:
propertynamemapred.child.env/namevalueLD_LIBRARY_PATH=/user/ngc/lib/value/property
as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this did 
not work for us.

The bottom line of all this is that we managed to write a Hadoop job in Java 
that invokes the OpenCV (Open Computer Vision) C++ libraries 
(http://opencv.willowgarage.com/wiki/) using the JavaCV Java wrapper 
(http://code.google.com/p/javacv/).  OpenCV includes over 500 image processing 
algorithms.




-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Friday, March 04, 2011 3:53 PM
To: common-user@hadoop.apache.org
Subject: EXT :Problem running a Hadoop program with external libraries

We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the run as Java application button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then exported from Eclipse a runnable jar which extracted 
required libraries into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory.  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:


  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  property
namemapred.child.env/name

valueLD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64/value
  /property
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS

 Running Face

Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the run as Java application button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then exported from Eclipse a runnable jar which extracted 
required libraries into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory.  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:
  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  property
namemapred.child.env/name

valueLD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64/value
  /property
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS

 Running Face Program
11/03/04 12:44:10 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: No job jar file set.  User 
classes may not be found. See Job or Job#setJar(String).
11/03/04 12:44:10 INFO mapred.FileInputFormat: Total input paths to process : 1
11/03/04 12:44:10 WARN conf.Configuration: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
11/03/04 12:44:10 INFO mapreduce.JobSubmitter: number of splits:1
11/03/04 12:44:10 INFO mapreduce.JobSubmitter: adding the following namenodes' 
delegation tokens:null
11/03/04 12:44:10 WARN security.TokenCache: Overwriting existing token storage 
with # keys=0
11/03/04 12:44:10 INFO mapreduce.Job: Running job: job_local_0001
11/03/04 12:44:10 INFO mapred.LocalJobRunner: Waiting for map tasks
11/03/04 12:44:10 INFO mapred.LocalJobRunner: Starting task: 
attempt_local_0001_m_00_0
11/03/04 12:44:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
11/03/04 12:44:10 INFO compress.CodecPool: Got brand-new decompressor
11/03/04 12:44:10 INFO mapred.MapTask: numReduceTasks: 1
11/03/04 12:44:10 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
11/03/04 12:44:10 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
11/03/04 12:44:10 INFO mapred.MapTask: soft limit at 83886080
11/03/04 12:44:10 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
11/03/04 12:44:10 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
11/03/04 12:44:11 INFO mapreduce.Job:  map 0% reduce 0%
11/03/04 12:44:16 INFO mapred.LocalJobRunner: 
file:/home/ngc/eclipse_workspace/HadoopPrograms/Images2/JPGSequenceFile.001:0+411569
  map
11/03/04 12:44:17 INFO 

RE: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
Aaron,

   Thanks for the rapid responses.


* ulimit -u unlimited is in .bashrc.


* HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh


* Mapred.child.ulimit is set to 2048000 in mapred-site.xml


* Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml

   I take it you are suggesting that I change the java.opts command to:

Mapred.child.java.opts is value -Xmx1536m 
-Djava.library.path=/path/to/native/libs /value


Alan Ratner
Northrop Grumman Information Systems
Manager of Large-Scale Computing
9020 Junction Drive
Annapolis Junction, MD 20701
(410) 707-8605 (cell)

From: Aaron Kimball [mailto:akimbal...@gmail.com]
Sent: Friday, March 04, 2011 4:30 PM
To: common-user@hadoop.apache.org
Cc: Ratner, Alan S (IS)
Subject: EXT :Re: Problem running a Hadoop program with external libraries

Actually, I just misread your email and missed the difference between your 2nd 
and 3rd attempts.

Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a 
ulimit (either through your shell configuration, or through Hadoop itself)? I 
don't know where these cannot allocate memory errors are coming from. If 
they're from the OS, could it be because it needs to fork() and momentarily 
exceed the ulimit before loading the native libs?

- Aaron

On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball 
akimbal...@gmail.commailto:akimbal...@gmail.com wrote:
I don't know if putting native-code .so files inside a jar works. A native-code 
.so is not classloaded in the same way .class files are.

So the correct .so files probably need to exist in some physical directory on 
the worker machines. You may want to doublecheck that the correct directory on 
the worker machines is identified in the JVM property 'java.library.path' 
(instead of / in addition to $LD_LIBRARY_PATH). This can be manipulated in the 
Hadoop configuration setting mapred.child.java.opts (include 
'-Djava.library.path=/path/to/native/libs' in the string there.)

Also, if you added your .so files to a directory that is already used by the 
tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to 
restart the tasktracker instance for it to take effect. (This is true of .jar 
files in the $HADOOP_HOME/lib directory; I don't know if it is true for native 
libs as well.)

- Aaron

On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) 
alan.rat...@ngc.commailto:alan.rat...@ngc.com wrote:
We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the run as Java application button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then exported from Eclipse a runnable jar which extracted 
required libraries into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory.  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:
 ...
 libopencv_highgui_pch_dephelp.a
 libopencv_highgui.so
 libopencv_highgui.so.2.2
 libopencv_highgui.so.2.2.0
 ...

 When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
 com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
 property
   namemapred.child.env/name
   
valueLD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64/value
 /property
 The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2

Running C++ WordCount

2010-02-26 Thread Ratner, Alan S (IS)
I am trying to run the wordcount example in c/c++ given on
http://wiki.apache.org/hadoop/C%2B%2BWordCount with Hadoop 0.18.3 but
the instructions seem to make a number of assumptions. 

When I run Ant using the specified command ant -Dcompile.c++=yes
examples I get a BUILD FAILED error ... Cannot run program
c:\...\hadoop-0.18.3\src\c++\pipes\configure (in directory
...\hadoop-0.18.3\build\c++-build\Windows_XP-x86-32\pipes):
CreateProcess error=193, %1 is not a valid Win32 application

Question 1: Where in the directory path do I put the wordcount code and
should it get a .cpp extension or something else?

Question 2: Where should I be when I execute the Ant command?  Ant
complains that it cannot find build.xml unless I am in the
...\hadoop-0.18.3 directory.

Question 3: The include statements in the wordcount code are of the form
#include Hadoop/xxx.hh.  These include files reside both in
...\hadoop-0.18.3\c++\Linux-i386-32\include\hadoop and in
...\hadoop-0.18.3\c++\Linux-amd64-64\include\hadoop.  Does Ant produce
both a 32-bit and a 64-bit version of my compiled code?

Question 4: Where does Ant put the compiled code?


Thanks,
Alan Ratner




Running C++ WordCount

2010-02-24 Thread Ratner, Alan S (IS)
I am trying to run the wordcount example in c/c++ given on
http://wiki.apache.org/hadoop/C%2B%2BWordCount with Hadoop 0.18.3 but
when I run Ant using the specified command ant -Dcompile.c++=yes
examples I get a BUILD FAILED error ... Cannot run program
c:\...\hadoop-0.18.3\src\c++\pipes\configure (in directory
...\hadoop-0.18.3\build\c++-build\Windows_XP-x86-32\pipes):
CreateProcess error=193, %1 is not a valid Win32 application

Question 1: Where in the directory path do I put the wordcount code and
should it get a .cpp extension or something else?

Question 2: Where should I be when I execute the Ant command?  Ant
complains that it cannot find build.xml unless I am in the
...\hadoop-0.18.3 directory.

Question 3: The include statements in the wordcount code are of the form
#include Hadoop/xxx.hh.  These include files reside both in
...\hadoop-0.18.3\c++\Linux-i386-32\include\hadoop and in
...\hadoop-0.18.3\c++\Linux-amd64-64\include\hadoop.  Does Ant produce
both a 32-bit and a 64-bit version of my compiled code?

Question 4: Where does Ant put the compiled code?


Thanks,
Alan Ratner