Re: Problem running a Hadoop program with external libraries

2011-03-11 Thread Allen Wittenauer

On Mar 8, 2011, at 1:21 PM, Ratner, Alan S (IS) wrote:
> We had tried putting all the libraries directly in HDFS with a pointer in 
> mapred-site.xml:
> mapred.child.envLD_LIBRARY_PATH=/user/ngc/lib
> as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this 
> did not work for us.

Correct.  This isn't expected to work.

HDFS files are not directly accessible from the shell without some sort 
of action having taken place.   In order for the above to work, anything 
reading the LD_LIBRARY_PATH environment variable would have to know that 
'/user/...' is a) inside HDFS and b) know how to access it.   The reason why 
the distributed cache method works is because it pulls files from HDFS and 
places them in the local UNIX file system.  From there, UNIX processes can now 
access them.

HADOOP-2838 is really about providing a way for applications to get to 
libraries that are already installed at the UNIX level.  (Although, in reality, 
it would likely be better if applications were linked with a better value 
provided for the runtime library search path -R/-rpath/ld.so.conf/crle/etc 
rather than using LD_LIBRARY_PATH.)

RE: Problem running a Hadoop program with external libraries

2011-03-08 Thread Ratner, Alan S (IS)
One other thing: We were getting out-of-memory errors with these external 
libraries and we had to reduce the value of   
mapred.child.java.opts found in mapred-site.xml.  We had 
originally been using 2 GB (our servers have 24-48 GB RAM) and eliminated the 
out-of-memory errors by reducing this value to 1.28 GB.

Alan


-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Tuesday, March 08, 2011 4:22 PM
To: common-user@hadoop.apache.org
Cc: Gerlach, Hannah L (IS); Andrew Levine
Subject: EXT :RE: Problem running a Hadoop program with external libraries

Thanks to all who suggested solutions to our problem of running a Java MR job 
using both external Java and C++ libraries.

We got it to work by moving all our .so files into an archive 
(http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html) 
and publishing it to our MR app with a single statement:
DistributedCache.createSymlink(conf).  

We found that we had to use Eclipse to generate a "runnable" jar file in 
"extract" mode; running an ordinary jar did not work.  (We tried putting our 
external jars in the archive file but a plain jar still did not work - perhaps 
I haven't assembled the complete set of jars into the archive.)

We had tried putting all the libraries directly in HDFS with a pointer in 
mapred-site.xml:
mapred.child.envLD_LIBRARY_PATH=/user/ngc/lib
as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this did 
not work for us.

The bottom line of all this is that we managed to write a Hadoop job in Java 
that invokes the OpenCV (Open Computer Vision) C++ libraries 
(http://opencv.willowgarage.com/wiki/) using the JavaCV Java wrapper 
(http://code.google.com/p/javacv/).  OpenCV includes over 500 image processing 
algorithms.




-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Friday, March 04, 2011 3:53 PM
To: common-user@hadoop.apache.org
Subject: EXT :Problem running a Hadoop program with external libraries

We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the "run as Java application" button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
required libraries" into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate "libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory".  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:


  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  
mapred.child.env

LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
  
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDAL

RE: Problem running a Hadoop program with external libraries

2011-03-08 Thread Ratner, Alan S (IS)
Thanks to all who suggested solutions to our problem of running a Java MR job 
using both external Java and C++ libraries.

We got it to work by moving all our .so files into an archive 
(http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html) 
and publishing it to our MR app with a single statement:
DistributedCache.createSymlink(conf).  

We found that we had to use Eclipse to generate a "runnable" jar file in 
"extract" mode; running an ordinary jar did not work.  (We tried putting our 
external jars in the archive file but a plain jar still did not work - perhaps 
I haven't assembled the complete set of jars into the archive.)

We had tried putting all the libraries directly in HDFS with a pointer in 
mapred-site.xml:
mapred.child.envLD_LIBRARY_PATH=/user/ngc/lib
as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this did 
not work for us.

The bottom line of all this is that we managed to write a Hadoop job in Java 
that invokes the OpenCV (Open Computer Vision) C++ libraries 
(http://opencv.willowgarage.com/wiki/) using the JavaCV Java wrapper 
(http://code.google.com/p/javacv/).  OpenCV includes over 500 image processing 
algorithms.




-Original Message-
From: Ratner, Alan S (IS) [mailto:alan.rat...@ngc.com] 
Sent: Friday, March 04, 2011 3:53 PM
To: common-user@hadoop.apache.org
Subject: EXT :Problem running a Hadoop program with external libraries

We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the "run as Java application" button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
required libraries" into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate "libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory".  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:

  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  
mapred.child.env

LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
  
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS

>>>> Running Face Program
11/03/04 12:44:10 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: No job jar file set.  User 
classes may not be found. See Job or Job#setJar(String).
11/0

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-05 Thread Alejandro Abdelnur
Why don't you put your native library in HDFS and use the DistributedCache
to make them avail to the tasks. For example:

Copy 'foo.so' to 'hdfs://localhost:8020/tmp/foo.so', then added to the job
distributed cache:

  DistributedCache.addCacheFile("hdfs://localhost:8020/tmp/foo.so#foo.so",
jobConf);
  DistributedCache.createSymlink(conf);

Note that the #foo.so will create a soflink in the task running dir. And the
task running dir is in LD_PATH of your task.

Alejandro

On Sat, Mar 5, 2011 at 7:19 AM, Lance Norskog  wrote:

> I have never heard of putting a native code shared library in a Java jar. I
> doubt that it works. But it's a cool idea!
>
> A Unix binary program loads shared libraries from the paths given in the
> environment variable LD_LIBRARY_PATH. This has to be set to the directory
> with the OpenCV .so file when you start Java.
>
> Lance
>
> On Mar 4, 2011, at 2:13 PM, Brian Bockelman wrote:
>
> > Hi,
> >
> > Check your kernel's overcommit settings.  This will prevent the JVM from
> allocating memory even when there's free RAM.
> >
> > Brian
> >
> > On Mar 4, 2011, at 3:55 PM, Ratner, Alan S (IS) wrote:
> >
> >> Aaron,
> >>
> >>  Thanks for the rapid responses.
> >>
> >>
> >> * "ulimit -u unlimited" is in .bashrc.
> >>
> >>
> >> * HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh
> >>
> >>
> >> * Mapred.child.ulimit is set to 2048000 in mapred-site.xml
> >>
> >>
> >> * Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml
> >>
> >>  I take it you are suggesting that I change the java.opts command to:
> >>
> >> Mapred.child.java.opts is  -Xmx1536m
> -Djava.library.path=/path/to/native/libs 
> >>
> >>
> >> Alan Ratner
> >> Northrop Grumman Information Systems
> >> Manager of Large-Scale Computing
> >> 9020 Junction Drive
> >> Annapolis Junction, MD 20701
> >> (410) 707-8605 (cell)
> >>
> >> From: Aaron Kimball [mailto:akimbal...@gmail.com]
> >> Sent: Friday, March 04, 2011 4:30 PM
> >> To: common-user@hadoop.apache.org
> >> Cc: Ratner, Alan S (IS)
> >> Subject: EXT :Re: Problem running a Hadoop program with external
> libraries
> >>
> >> Actually, I just misread your email and missed the difference between
> your 2nd and 3rd attempts.
> >>
> >> Are you enforcing min/max JVM heap sizes on your tasks? Are you
> enforcing a ulimit (either through your shell configuration, or through
> Hadoop itself)? I don't know where these "cannot allocate memory" errors are
> coming from. If they're from the OS, could it be because it needs to fork()
> and momentarily exceed the ulimit before loading the native libs?
> >>
> >> - Aaron
> >>
> >> On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball  <mailto:akimbal...@gmail.com>> wrote:
> >> I don't know if putting native-code .so files inside a jar works. A
> native-code .so is not "classloaded" in the same way .class files are.
> >>
> >> So the correct .so files probably need to exist in some physical
> directory on the worker machines. You may want to doublecheck that the
> correct directory on the worker machines is identified in the JVM property
> 'java.library.path' (instead of / in addition to $LD_LIBRARY_PATH). This can
> be manipulated in the Hadoop configuration setting mapred.child.java.opts
> (include '-Djava.library.path=/path/to/native/libs' in the string there.)
> >>
> >> Also, if you added your .so files to a directory that is already used by
> the tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may
> need to restart the tasktracker instance for it to take effect. (This is
> true of .jar files in the $HADOOP_HOME/lib directory; I don't know if it is
> true for native libs as well.)
> >>
> >> - Aaron
> >>
> >> On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) <
> alan.rat...@ngc.com<mailto:alan.rat...@ngc.com>> wrote:
> >> We are having difficulties running a Hadoop program making calls to
> external libraries - but this occurs only when we run the program on our
> cluster and not from within Eclipse where we are apparently running in
> Hadoop's standalone mode.  This program invokes the Open Computer Vision
> libraries (OpenCV and JavaCV).  (I don't think there is a problem with our
> cluster - we've run many Hadoop jobs on it without d

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Lance Norskog
I have never heard of putting a native code shared library in a Java jar. I 
doubt that it works. But it's a cool idea!

A Unix binary program loads shared libraries from the paths given in the 
environment variable LD_LIBRARY_PATH. This has to be set to the directory with 
the OpenCV .so file when you start Java.

Lance

On Mar 4, 2011, at 2:13 PM, Brian Bockelman wrote:

> Hi,
> 
> Check your kernel's overcommit settings.  This will prevent the JVM from 
> allocating memory even when there's free RAM.
> 
> Brian
> 
> On Mar 4, 2011, at 3:55 PM, Ratner, Alan S (IS) wrote:
> 
>> Aaron,
>> 
>>  Thanks for the rapid responses.
>> 
>> 
>> * "ulimit -u unlimited" is in .bashrc.
>> 
>> 
>> * HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh
>> 
>> 
>> * Mapred.child.ulimit is set to 2048000 in mapred-site.xml
>> 
>> 
>> * Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml
>> 
>>  I take it you are suggesting that I change the java.opts command to:
>> 
>> Mapred.child.java.opts is  -Xmx1536m 
>> -Djava.library.path=/path/to/native/libs 
>> 
>> 
>> Alan Ratner
>> Northrop Grumman Information Systems
>> Manager of Large-Scale Computing
>> 9020 Junction Drive
>> Annapolis Junction, MD 20701
>> (410) 707-8605 (cell)
>> 
>> From: Aaron Kimball [mailto:akimbal...@gmail.com]
>> Sent: Friday, March 04, 2011 4:30 PM
>> To: common-user@hadoop.apache.org
>> Cc: Ratner, Alan S (IS)
>> Subject: EXT :Re: Problem running a Hadoop program with external libraries
>> 
>> Actually, I just misread your email and missed the difference between your 
>> 2nd and 3rd attempts.
>> 
>> Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a 
>> ulimit (either through your shell configuration, or through Hadoop itself)? 
>> I don't know where these "cannot allocate memory" errors are coming from. If 
>> they're from the OS, could it be because it needs to fork() and momentarily 
>> exceed the ulimit before loading the native libs?
>> 
>> - Aaron
>> 
>> On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball 
>> mailto:akimbal...@gmail.com>> wrote:
>> I don't know if putting native-code .so files inside a jar works. A 
>> native-code .so is not "classloaded" in the same way .class files are.
>> 
>> So the correct .so files probably need to exist in some physical directory 
>> on the worker machines. You may want to doublecheck that the correct 
>> directory on the worker machines is identified in the JVM property 
>> 'java.library.path' (instead of / in addition to $LD_LIBRARY_PATH). This can 
>> be manipulated in the Hadoop configuration setting mapred.child.java.opts 
>> (include '-Djava.library.path=/path/to/native/libs' in the string there.)
>> 
>> Also, if you added your .so files to a directory that is already used by the 
>> tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to 
>> restart the tasktracker instance for it to take effect. (This is true of 
>> .jar files in the $HADOOP_HOME/lib directory; I don't know if it is true for 
>> native libs as well.)
>> 
>> - Aaron
>> 
>> On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) 
>> mailto:alan.rat...@ngc.com>> wrote:
>> We are having difficulties running a Hadoop program making calls to external 
>> libraries - but this occurs only when we run the program on our cluster and 
>> not from within Eclipse where we are apparently running in Hadoop's 
>> standalone mode.  This program invokes the Open Computer Vision libraries 
>> (OpenCV and JavaCV).  (I don't think there is a problem with our cluster - 
>> we've run many Hadoop jobs on it without difficulty.)
>> 
>> 1.  I normally use Eclipse to create jar files for our Hadoop programs 
>> but I inadvertently hit the "run as Java application" button and the program 
>> ran fine, reading the input file from the eclipse workspace rather than HDFS 
>> and writing the output file to the same place.  Hadoop's output appears 
>> below.  (This occurred on the master Hadoop server.)
>> 
>> 2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
>> required libraries" into the generated jar - presumably producing a jar file 
>> that incorporated all the required library functions. (The plain jar file 
>> for this program is 17 kB while the runnable jar is 30M

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Brian Bockelman
Hi,

Check your kernel's overcommit settings.  This will prevent the JVM from 
allocating memory even when there's free RAM.

Brian

On Mar 4, 2011, at 3:55 PM, Ratner, Alan S (IS) wrote:

> Aaron,
> 
>   Thanks for the rapid responses.
> 
> 
> * "ulimit -u unlimited" is in .bashrc.
> 
> 
> * HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh
> 
> 
> * Mapred.child.ulimit is set to 2048000 in mapred-site.xml
> 
> 
> * Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml
> 
>   I take it you are suggesting that I change the java.opts command to:
> 
> Mapred.child.java.opts is  -Xmx1536m 
> -Djava.library.path=/path/to/native/libs 
> 
> 
> Alan Ratner
> Northrop Grumman Information Systems
> Manager of Large-Scale Computing
> 9020 Junction Drive
> Annapolis Junction, MD 20701
> (410) 707-8605 (cell)
> 
> From: Aaron Kimball [mailto:akimbal...@gmail.com]
> Sent: Friday, March 04, 2011 4:30 PM
> To: common-user@hadoop.apache.org
> Cc: Ratner, Alan S (IS)
> Subject: EXT :Re: Problem running a Hadoop program with external libraries
> 
> Actually, I just misread your email and missed the difference between your 
> 2nd and 3rd attempts.
> 
> Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a 
> ulimit (either through your shell configuration, or through Hadoop itself)? I 
> don't know where these "cannot allocate memory" errors are coming from. If 
> they're from the OS, could it be because it needs to fork() and momentarily 
> exceed the ulimit before loading the native libs?
> 
> - Aaron
> 
> On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball 
> mailto:akimbal...@gmail.com>> wrote:
> I don't know if putting native-code .so files inside a jar works. A 
> native-code .so is not "classloaded" in the same way .class files are.
> 
> So the correct .so files probably need to exist in some physical directory on 
> the worker machines. You may want to doublecheck that the correct directory 
> on the worker machines is identified in the JVM property 'java.library.path' 
> (instead of / in addition to $LD_LIBRARY_PATH). This can be manipulated in 
> the Hadoop configuration setting mapred.child.java.opts (include 
> '-Djava.library.path=/path/to/native/libs' in the string there.)
> 
> Also, if you added your .so files to a directory that is already used by the 
> tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to 
> restart the tasktracker instance for it to take effect. (This is true of .jar 
> files in the $HADOOP_HOME/lib directory; I don't know if it is true for 
> native libs as well.)
> 
> - Aaron
> 
> On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) 
> mailto:alan.rat...@ngc.com>> wrote:
> We are having difficulties running a Hadoop program making calls to external 
> libraries - but this occurs only when we run the program on our cluster and 
> not from within Eclipse where we are apparently running in Hadoop's 
> standalone mode.  This program invokes the Open Computer Vision libraries 
> (OpenCV and JavaCV).  (I don't think there is a problem with our cluster - 
> we've run many Hadoop jobs on it without difficulty.)
> 
> 1.  I normally use Eclipse to create jar files for our Hadoop programs 
> but I inadvertently hit the "run as Java application" button and the program 
> ran fine, reading the input file from the eclipse workspace rather than HDFS 
> and writing the output file to the same place.  Hadoop's output appears 
> below.  (This occurred on the master Hadoop server.)
> 
> 2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
> required libraries" into the generated jar - presumably producing a jar file 
> that incorporated all the required library functions. (The plain jar file for 
> this program is 17 kB while the runnable jar is 30MB.)  When I try to run 
> this on my Hadoop cluster (including my master and slave servers) the program 
> reports that it is unable to locate "libopencv_highgui.so.2.2: cannot open 
> shared object file: No such file or directory".  Now, in addition to this 
> library being incorporated inside the runnable jar file it is also present on 
> each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
> loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
> include:
> ...
> libopencv_highgui_pch_dephelp.a
> libopencv_highgui.so
> libopencv_highgui.so.2.2
> libopencv_highgui.so.2.2.0
> ...
> 
> When I poke around inside the runnable jar 

RE: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
Aaron,

   Thanks for the rapid responses.


* "ulimit -u unlimited" is in .bashrc.


* HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh


* Mapred.child.ulimit is set to 2048000 in mapred-site.xml


* Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml

   I take it you are suggesting that I change the java.opts command to:

Mapred.child.java.opts is  -Xmx1536m 
-Djava.library.path=/path/to/native/libs 


Alan Ratner
Northrop Grumman Information Systems
Manager of Large-Scale Computing
9020 Junction Drive
Annapolis Junction, MD 20701
(410) 707-8605 (cell)

From: Aaron Kimball [mailto:akimbal...@gmail.com]
Sent: Friday, March 04, 2011 4:30 PM
To: common-user@hadoop.apache.org
Cc: Ratner, Alan S (IS)
Subject: EXT :Re: Problem running a Hadoop program with external libraries

Actually, I just misread your email and missed the difference between your 2nd 
and 3rd attempts.

Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a 
ulimit (either through your shell configuration, or through Hadoop itself)? I 
don't know where these "cannot allocate memory" errors are coming from. If 
they're from the OS, could it be because it needs to fork() and momentarily 
exceed the ulimit before loading the native libs?

- Aaron

On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball 
mailto:akimbal...@gmail.com>> wrote:
I don't know if putting native-code .so files inside a jar works. A native-code 
.so is not "classloaded" in the same way .class files are.

So the correct .so files probably need to exist in some physical directory on 
the worker machines. You may want to doublecheck that the correct directory on 
the worker machines is identified in the JVM property 'java.library.path' 
(instead of / in addition to $LD_LIBRARY_PATH). This can be manipulated in the 
Hadoop configuration setting mapred.child.java.opts (include 
'-Djava.library.path=/path/to/native/libs' in the string there.)

Also, if you added your .so files to a directory that is already used by the 
tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to 
restart the tasktracker instance for it to take effect. (This is true of .jar 
files in the $HADOOP_HOME/lib directory; I don't know if it is true for native 
libs as well.)

- Aaron

On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) 
mailto:alan.rat...@ngc.com>> wrote:
We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the "run as Java application" button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
required libraries" into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate "libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory".  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:
 ...
 libopencv_highgui_pch_dephelp.a
 libopencv_highgui.so
 libopencv_highgui.so.2.2
 libopencv_highgui.so.2.2.0
 ...

 When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
 com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
 
   mapred.child.env
   
LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
 
 The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our s

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
Actually, I just misread your email and missed the difference between your
2nd and 3rd attempts.

Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a
ulimit (either through your shell configuration, or through Hadoop itself)?
I don't know where these "cannot allocate memory" errors are coming from. If
they're from the OS, could it be because it needs to fork() and momentarily
exceed the ulimit before loading the native libs?

- Aaron

On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball  wrote:

> I don't know if putting native-code .so files inside a jar works. A
> native-code .so is not "classloaded" in the same way .class files are.
>
> So the correct .so files probably need to exist in some physical directory
> on the worker machines. You may want to doublecheck that the correct
> directory on the worker machines is identified in the JVM property
> 'java.library.path' (instead of / in addition to $LD_LIBRARY_PATH). This can
> be manipulated in the Hadoop configuration setting mapred.child.java.opts
> (include '-Djava.library.path=/path/to/native/libs' in the string there.)
>
> Also, if you added your .so files to a directory that is already used by
> the tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may
> need to restart the tasktracker instance for it to take effect. (This is
> true of .jar files in the $HADOOP_HOME/lib directory; I don't know if it is
> true for native libs as well.)
>
> - Aaron
>
>
> On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) 
> wrote:
>
>> We are having difficulties running a Hadoop program making calls to
>> external libraries - but this occurs only when we run the program on our
>> cluster and not from within Eclipse where we are apparently running in
>> Hadoop's standalone mode.  This program invokes the Open Computer Vision
>> libraries (OpenCV and JavaCV).  (I don't think there is a problem with our
>> cluster - we've run many Hadoop jobs on it without difficulty.)
>>
>> 1.  I normally use Eclipse to create jar files for our Hadoop programs
>> but I inadvertently hit the "run as Java application" button and the program
>> ran fine, reading the input file from the eclipse workspace rather than HDFS
>> and writing the output file to the same place.  Hadoop's output appears
>> below.  (This occurred on the master Hadoop server.)
>>
>> 2.  I then "exported" from Eclipse a "runnable jar" which "extracted
>> required libraries" into the generated jar - presumably producing a jar file
>> that incorporated all the required library functions. (The plain jar file
>> for this program is 17 kB while the runnable jar is 30MB.)  When I try to
>> run this on my Hadoop cluster (including my master and slave servers) the
>> program reports that it is unable to locate "libopencv_highgui.so.2.2:
>> cannot open shared object file: No such file or directory".  Now, in
>> addition to this library being incorporated inside the runnable jar file it
>> is also present on each of my servers at
>> hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have loaded the same
>> libraries (to give Hadoop 2 shots at finding them).  These include:
>>  ...
>>  libopencv_highgui_pch_dephelp.a
>>  libopencv_highgui.so
>>  libopencv_highgui.so.2.2
>>  libopencv_highgui.so.2.2.0
>>  ...
>>
>>  When I poke around inside the runnable jar I find
>> javacv_linux-x86_64.jar which contains:
>>  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so
>>
>> 3.  I then tried adding the following to mapred-site.xml as suggested
>> in Patch 2838 that's supposed to be included in hadoop 0.21
>> https://issues.apache.org/jira/browse/HADOOP-2838
>>  
>>mapred.child.env
>>
>>  
>> LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
>>  
>>  The log is included at the bottom of this email with Hadoop now
>> complaining about a different missing library with an out-of-memory error.
>>
>> Does anyone have any ideas as to what is going wrong here?  Any help would
>> be appreciated.  Thanks.
>>
>> Alan
>>
>>
>> BTW: Each of our servers has 4 hard drives and many of the errors below
>> refer to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for
>> HDFS and thus perhaps not a good place for Hadoop to be looking for a
>> library file.  My slaves have 24 GB RAM, the jar file is 30 MB, and the
>> sequence file being read is 400 KB - so I hope I am not running out of
>> memory.
>>
>>
>> 1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE -
>> SUCCESS
>>
>>  Running Face Program
>> 11/03/04 12:44:10 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=30
>> 11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for the same.
>> 11/

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
I don't know if putting native-code .so files inside a jar works. A
native-code .so is not "classloaded" in the same way .class files are.

So the correct .so files probably need to exist in some physical directory
on the worker machines. You may want to doublecheck that the correct
directory on the worker machines is identified in the JVM property
'java.library.path' (instead of / in addition to $LD_LIBRARY_PATH). This can
be manipulated in the Hadoop configuration setting mapred.child.java.opts
(include '-Djava.library.path=/path/to/native/libs' in the string there.)

Also, if you added your .so files to a directory that is already used by the
tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to
restart the tasktracker instance for it to take effect. (This is true of
.jar files in the $HADOOP_HOME/lib directory; I don't know if it is true for
native libs as well.)

- Aaron


On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) wrote:

> We are having difficulties running a Hadoop program making calls to
> external libraries - but this occurs only when we run the program on our
> cluster and not from within Eclipse where we are apparently running in
> Hadoop's standalone mode.  This program invokes the Open Computer Vision
> libraries (OpenCV and JavaCV).  (I don't think there is a problem with our
> cluster - we've run many Hadoop jobs on it without difficulty.)
>
> 1.  I normally use Eclipse to create jar files for our Hadoop programs
> but I inadvertently hit the "run as Java application" button and the program
> ran fine, reading the input file from the eclipse workspace rather than HDFS
> and writing the output file to the same place.  Hadoop's output appears
> below.  (This occurred on the master Hadoop server.)
>
> 2.  I then "exported" from Eclipse a "runnable jar" which "extracted
> required libraries" into the generated jar - presumably producing a jar file
> that incorporated all the required library functions. (The plain jar file
> for this program is 17 kB while the runnable jar is 30MB.)  When I try to
> run this on my Hadoop cluster (including my master and slave servers) the
> program reports that it is unable to locate "libopencv_highgui.so.2.2:
> cannot open shared object file: No such file or directory".  Now, in
> addition to this library being incorporated inside the runnable jar file it
> is also present on each of my servers at
> hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have loaded the same
> libraries (to give Hadoop 2 shots at finding them).  These include:
>  ...
>  libopencv_highgui_pch_dephelp.a
>  libopencv_highgui.so
>  libopencv_highgui.so.2.2
>  libopencv_highgui.so.2.2.0
>  ...
>
>  When I poke around inside the runnable jar I find
> javacv_linux-x86_64.jar which contains:
>  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so
>
> 3.  I then tried adding the following to mapred-site.xml as suggested
> in Patch 2838 that's supposed to be included in hadoop 0.21
> https://issues.apache.org/jira/browse/HADOOP-2838
>  
>mapred.child.env
>
>  
> LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
>  
>  The log is included at the bottom of this email with Hadoop now
> complaining about a different missing library with an out-of-memory error.
>
> Does anyone have any ideas as to what is going wrong here?  Any help would
> be appreciated.  Thanks.
>
> Alan
>
>
> BTW: Each of our servers has 4 hard drives and many of the errors below
> refer to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for
> HDFS and thus perhaps not a good place for Hadoop to be looking for a
> library file.  My slaves have 24 GB RAM, the jar file is 30 MB, and the
> sequence file being read is 400 KB - so I hope I am not running out of
> memory.
>
>
> 1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS
>
>  Running Face Program
> 11/03/04 12:44:10 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=30
> 11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/04 12:44:10 WARN mapreduce.JobSubmitter: No job jar file set.  User
> classes may not be found. See Job or Job#setJar(String).
> 11/03/04 12:44:10 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 11/03/04 12:44:10 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 11/03/04 12:44:10 INFO mapreduce.JobSubmitter: number of splits:1
> 11/03/04 12:44:10 INFO mapreduce.JobSubmitter: adding the following
> namenodes' delegation tokens:null
> 11/03/04 12:44:10 WARN security.TokenCache: Overwriting existing token
> storage with # keys=0
> 11/03/04 12:44:10 INFO mapreduce.J

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Harsh J
I'm only guessing here and might be grossly wrong about my hunch.

Are you reusing your JVMs across tasks? Could you see if this goes
away without reuse?

Would be good if you can monitor your launched Tasks
(JConsole/VisualVM/etc.) to affirm that there's either a code-based
memory leak or some other funny issue relating to loading native
libraries into the JVM.

On Sat, Mar 5, 2011 at 2:23 AM, Ratner, Alan S (IS)  wrote:
> attempt_201103021428_0069_m_69_0: Java HotSpot(TM) 64-Bit Server VM 
> warning: Exception java.lang.OutOfMemoryError occurred dispatching signal 
> SIGTERM to handler- the VM may need to be forcibly terminated

Also, do all map tasks fail? Or do some succeed (perhaps the first wave)?

-- 
Harsh J
www.harshj.com


Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
We are having difficulties running a Hadoop program making calls to external 
libraries - but this occurs only when we run the program on our cluster and not 
from within Eclipse where we are apparently running in Hadoop's standalone 
mode.  This program invokes the Open Computer Vision libraries (OpenCV and 
JavaCV).  (I don't think there is a problem with our cluster - we've run many 
Hadoop jobs on it without difficulty.)

1.  I normally use Eclipse to create jar files for our Hadoop programs but 
I inadvertently hit the "run as Java application" button and the program ran 
fine, reading the input file from the eclipse workspace rather than HDFS and 
writing the output file to the same place.  Hadoop's output appears below.  
(This occurred on the master Hadoop server.)

2.  I then "exported" from Eclipse a "runnable jar" which "extracted 
required libraries" into the generated jar - presumably producing a jar file 
that incorporated all the required library functions. (The plain jar file for 
this program is 17 kB while the runnable jar is 30MB.)  When I try to run this 
on my Hadoop cluster (including my master and slave servers) the program 
reports that it is unable to locate "libopencv_highgui.so.2.2: cannot open 
shared object file: No such file or directory".  Now, in addition to this 
library being incorporated inside the runnable jar file it is also present on 
each of my servers at hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have 
loaded the same libraries (to give Hadoop 2 shots at finding them).  These 
include:
  ...
  libopencv_highgui_pch_dephelp.a
  libopencv_highgui.so
  libopencv_highgui.so.2.2
  libopencv_highgui.so.2.2.0
  ...

  When I poke around inside the runnable jar I find javacv_linux-x86_64.jar 
which contains:
  com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so

3.  I then tried adding the following to mapred-site.xml as suggested in 
Patch 2838 that's supposed to be included in hadoop 0.21 
https://issues.apache.org/jira/browse/HADOOP-2838
  
mapred.child.env

LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64
  
  The log is included at the bottom of this email with Hadoop now 
complaining about a different missing library with an out-of-memory error.

Does anyone have any ideas as to what is going wrong here?  Any help would be 
appreciated.  Thanks.

Alan


BTW: Each of our servers has 4 hard drives and many of the errors below refer 
to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for HDFS and 
thus perhaps not a good place for Hadoop to be looking for a library file.  My 
slaves have 24 GB RAM, the jar file is 30 MB, and the sequence file being read 
is 400 KB - so I hope I am not running out of memory.


1.  RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS

 Running Face Program
11/03/04 12:44:10 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
11/03/04 12:44:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
11/03/04 12:44:10 WARN mapreduce.JobSubmitter: No job jar file set.  User 
classes may not be found. See Job or Job#setJar(String).
11/03/04 12:44:10 INFO mapred.FileInputFormat: Total input paths to process : 1
11/03/04 12:44:10 WARN conf.Configuration: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
11/03/04 12:44:10 INFO mapreduce.JobSubmitter: number of splits:1
11/03/04 12:44:10 INFO mapreduce.JobSubmitter: adding the following namenodes' 
delegation tokens:null
11/03/04 12:44:10 WARN security.TokenCache: Overwriting existing token storage 
with # keys=0
11/03/04 12:44:10 INFO mapreduce.Job: Running job: job_local_0001
11/03/04 12:44:10 INFO mapred.LocalJobRunner: Waiting for map tasks
11/03/04 12:44:10 INFO mapred.LocalJobRunner: Starting task: 
attempt_local_0001_m_00_0
11/03/04 12:44:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
11/03/04 12:44:10 INFO compress.CodecPool: Got brand-new decompressor
11/03/04 12:44:10 INFO mapred.MapTask: numReduceTasks: 1
11/03/04 12:44:10 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
11/03/04 12:44:10 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
11/03/04 12:44:10 INFO mapred.MapTask: soft limit at 83886080
11/03/04 12:44:10 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
11/03/04 12:44:10 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
11/03/04 12:44:11 INFO mapreduce.Job:  map 0% reduce 0%
11/03/04 12:44:16 INFO mapred.LocalJobRunner: 
file:/home/ngc/eclipse_workspace/HadoopPrograms/Images2/JPGSequenceFile.001:0+411569
 > map
11/03/04 12:44:17 INFO mapreduce.Job:  map 57% reduce 0%