JNI in MAp REuce

2010-02-12 Thread Utkarsh Agarwal
Can anybody point me how to use JNI calls in a map reduce program. My .so
files have other dependencies also , is there a way to load the
LD_LIBRARY_PATH for child processes . Should all the native stuff be in
HDFS?

Thanks,
Utkarsh.


Re: JNI in MAp REuce

2010-02-12 Thread Alex Kozlov
All native libraries should be on each of the cluster nodes.  You need to
set "java.library.path" property to point to your libraries (or just put
them in the default system dirs).

On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal wrote:

> Can anybody point me how to use JNI calls in a map reduce program. My .so
> files have other dependencies also , is there a way to load the
> LD_LIBRARY_PATH for child processes . Should all the native stuff be in
> HDFS?
>
> Thanks,
> Utkarsh.
>


Re: JNI in MAp REuce

2010-02-12 Thread Allen Wittenauer

... or just use distributed cache.


On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:

> All native libraries should be on each of the cluster nodes.  You need to
> set "java.library.path" property to point to your libraries (or just put
> them in the default system dirs).
> 
> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
> wrote:
> 
>> Can anybody point me how to use JNI calls in a map reduce program. My .so
>> files have other dependencies also , is there a way to load the
>> LD_LIBRARY_PATH for child processes . Should all the native stuff be in
>> HDFS?
>> 
>> Thanks,
>> Utkarsh.
>> 



Re: JNI in MAp REuce

2010-02-16 Thread Jason Rutherglen
How would this work?

On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
 wrote:
>
> ... or just use distributed cache.
>
>
> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
>
>> All native libraries should be on each of the cluster nodes.  You need to
>> set "java.library.path" property to point to your libraries (or just put
>> them in the default system dirs).
>>
>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
>> wrote:
>>
>>> Can anybody point me how to use JNI calls in a map reduce program. My .so
>>> files have other dependencies also , is there a way to load the
>>> LD_LIBRARY_PATH for child processes . Should all the native stuff be in
>>> HDFS?
>>>
>>> Thanks,
>>> Utkarsh.
>>>
>
>


Re: JNI in MAp REuce

2010-02-18 Thread Allen Wittenauer


Like this:

http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
ative+libraries+through+DistributedCache



On 2/16/10 5:29 PM, "Jason Rutherglen"  wrote:

> How would this work?
> 
> On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
>  wrote:
>> 
>> ... or just use distributed cache.
>> 
>> 
>> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
>> 
>>> All native libraries should be on each of the cluster nodes.  You need to
>>> set "java.library.path" property to point to your libraries (or just put
>>> them in the default system dirs).
>>> 
>>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
>>> wrote:
>>> 
 Can anybody point me how to use JNI calls in a map reduce program. My .so
 files have other dependencies also , is there a way to load the
 LD_LIBRARY_PATH for child processes . Should all the native stuff be in
 HDFS?
 
 Thanks,
 Utkarsh.
 
>> 
>> 



Re: JNI in MAp REuce

2010-02-18 Thread Utkarsh Agarwal
My .so file has other .so dependencies , so would I have to add them all in
the DistributedCache . Also I tried setting LD_LIBRARY_PATH in
mapred-site.xml as


  mapred.child.env
  LD_LIBRARY_PATH=/opt/libs/
 


doesnt work. the java.library.path is not sufficient to set , have to get
LD_LIB set.

-Utkarsh

On Thu, Feb 18, 2010 at 3:14 PM, Allen Wittenauer
wrote:

>
>
> Like this:
>
>
> http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
> ative+libraries+through+DistributedCache
>
>
>
> On 2/16/10 5:29 PM, "Jason Rutherglen"  wrote:
>
> > How would this work?
> >
> > On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
> >  wrote:
> >>
> >> ... or just use distributed cache.
> >>
> >>
> >> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
> >>
> >>> All native libraries should be on each of the cluster nodes.  You need
> to
> >>> set "java.library.path" property to point to your libraries (or just
> put
> >>> them in the default system dirs).
> >>>
> >>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
> >>> wrote:
> >>>
>  Can anybody point me how to use JNI calls in a map reduce program. My
> .so
>  files have other dependencies also , is there a way to load the
>  LD_LIBRARY_PATH for child processes . Should all the native stuff be
> in
>  HDFS?
> 
>  Thanks,
>  Utkarsh.
> 
> >>
> >>
>
>


Re: JNI in MAp REuce

2010-02-18 Thread Jason Venner
We used do this all the time at attributor. Now if I can remember how we did
it.

If the libraries are constant you can just install them on your nodes to
save pushing them through the distributed cache, and then setup the
LD_LIBRARY_PATH correctly.

The key issue if you push them through the distributed cache is ensuring
that the directory that the library gets dropped in, is actually in the
runtime java.library.path
You can also give explicit paths to System.load

The -Djava.library.path in the child.options mapred.child.java.opts (if I
have the param correct) should work also.

On Thu, Feb 18, 2010 at 6:49 PM, Utkarsh Agarwal wrote:

> My .so file has other .so dependencies , so would I have to add them all in
> the DistributedCache . Also I tried setting LD_LIBRARY_PATH in
> mapred-site.xml as
>
> 
>  mapred.child.env
>  LD_LIBRARY_PATH=/opt/libs/
>  
>
>
> doesnt work. the java.library.path is not sufficient to set , have to get
> LD_LIB set.
>
> -Utkarsh
>
> On Thu, Feb 18, 2010 at 3:14 PM, Allen Wittenauer
> wrote:
>
> >
> >
> > Like this:
> >
> >
> >
> http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
> > ative+libraries+through+DistributedCache
> >
> >
> >
> > On 2/16/10 5:29 PM, "Jason Rutherglen" 
> wrote:
> >
> > > How would this work?
> > >
> > > On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
> > >  wrote:
> > >>
> > >> ... or just use distributed cache.
> > >>
> > >>
> > >> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
> > >>
> > >>> All native libraries should be on each of the cluster nodes.  You
> need
> > to
> > >>> set "java.library.path" property to point to your libraries (or just
> > put
> > >>> them in the default system dirs).
> > >>>
> > >>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
> > >>> wrote:
> > >>>
> >  Can anybody point me how to use JNI calls in a map reduce program.
> My
> > .so
> >  files have other dependencies also , is there a way to load the
> >  LD_LIBRARY_PATH for child processes . Should all the native stuff be
> > in
> >  HDFS?
> > 
> >  Thanks,
> >  Utkarsh.
> > 
> > >>
> > >>
> >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: JNI in MAp REuce

2010-02-19 Thread Utkarsh Agarwal
How to set the  LD_LIBRARY_PATH  for the child , configuring mapred-site.xml
doesn't work.  Also  setting -Djava.library.path is not good enough since it
only gets the reference to the lib I am a trying to load(let's say lib.so) ,
but that lib has dependencies on other libs like lib1.so  resulting in
UnsatisfiedLinkError . Thus, LD_LIBRARY_PATH has to be set.



On Thu, Feb 18, 2010 at 10:03 PM, Jason Venner wrote:

> We used do this all the time at attributor. Now if I can remember how we
> did
> it.
>
> If the libraries are constant you can just install them on your nodes to
> save pushing them through the distributed cache, and then setup the
> LD_LIBRARY_PATH correctly.
>
> The key issue if you push them through the distributed cache is ensuring
> that the directory that the library gets dropped in, is actually in the
> runtime java.library.path
> You can also give explicit paths to System.load
>
> The -Djava.library.path in the child.options mapred.child.java.opts (if I
> have the param correct) should work also.
>
> On Thu, Feb 18, 2010 at 6:49 PM, Utkarsh Agarwal  >wrote:
>
> > My .so file has other .so dependencies , so would I have to add them all
> in
> > the DistributedCache . Also I tried setting LD_LIBRARY_PATH in
> > mapred-site.xml as
> >
> > 
> >  mapred.child.env
> >  LD_LIBRARY_PATH=/opt/libs/
> >  
> >
> >
> > doesnt work. the java.library.path is not sufficient to set , have to get
> > LD_LIB set.
> >
> > -Utkarsh
> >
> > On Thu, Feb 18, 2010 at 3:14 PM, Allen Wittenauer
> > wrote:
> >
> > >
> > >
> > > Like this:
> > >
> > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
> > > ative+libraries+through+DistributedCache
> > >
> > >
> > >
> > > On 2/16/10 5:29 PM, "Jason Rutherglen" 
> > wrote:
> > >
> > > > How would this work?
> > > >
> > > > On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
> > > >  wrote:
> > > >>
> > > >> ... or just use distributed cache.
> > > >>
> > > >>
> > > >> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
> > > >>
> > > >>> All native libraries should be on each of the cluster nodes.  You
> > need
> > > to
> > > >>> set "java.library.path" property to point to your libraries (or
> just
> > > put
> > > >>> them in the default system dirs).
> > > >>>
> > > >>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
> > > >>> wrote:
> > > >>>
> > >  Can anybody point me how to use JNI calls in a map reduce program.
> > My
> > > .so
> > >  files have other dependencies also , is there a way to load the
> > >  LD_LIBRARY_PATH for child processes . Should all the native stuff
> be
> > > in
> > >  HDFS?
> > > 
> > >  Thanks,
> > >  Utkarsh.
> > > 
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>


Re: JNI in MAp REuce

2010-02-19 Thread Allen Wittenauer

See http://issues.apache.org/jira/browse/HADOOP-2867 (and
https://issues.apache.org/jira/browse/HADOOP-5980 if you are using 0.21 or
Y! Hadoop w/LinuxTaskController). What version of Hadoop are you using?

Also, if this is custom code, what does the runtime link path look like ( -R
during compile time)?  Using $ORIGIN might be useful here.


On 2/19/10 8:47 AM, "Utkarsh Agarwal"  wrote:

> How to set the  LD_LIBRARY_PATH  for the child , configuring mapred-site.xml
> doesn't work.  Also  setting -Djava.library.path is not good enough since it
> only gets the reference to the lib I am a trying to load(let's say lib.so) ,
> but that lib has dependencies on other libs like lib1.so  resulting in
> UnsatisfiedLinkError . Thus, LD_LIBRARY_PATH has to be set.
> 
> 
> 
> On Thu, Feb 18, 2010 at 10:03 PM, Jason Venner wrote:
> 
>> We used do this all the time at attributor. Now if I can remember how we
>> did
>> it.
>> 
>> If the libraries are constant you can just install them on your nodes to
>> save pushing them through the distributed cache, and then setup the
>> LD_LIBRARY_PATH correctly.
>> 
>> The key issue if you push them through the distributed cache is ensuring
>> that the directory that the library gets dropped in, is actually in the
>> runtime java.library.path
>> You can also give explicit paths to System.load
>> 
>> The -Djava.library.path in the child.options mapred.child.java.opts (if I
>> have the param correct) should work also.
>> 
>> On Thu, Feb 18, 2010 at 6:49 PM, Utkarsh Agarwal >> wrote:
>> 
>>> My .so file has other .so dependencies , so would I have to add them all
>> in
>>> the DistributedCache . Also I tried setting LD_LIBRARY_PATH in
>>> mapred-site.xml as
>>> 
>>> 
>>>  mapred.child.env
>>>  LD_LIBRARY_PATH=/opt/libs/
>>>  
>>> 
>>> 
>>> doesnt work. the java.library.path is not sufficient to set , have to get
>>> LD_LIB set.
>>> 
>>> -Utkarsh
>>> 
>>> On Thu, Feb 18, 2010 at 3:14 PM, Allen Wittenauer
>>> wrote:
>>> 
 
 
 Like this:
 
 
 
>>> 
>> http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
 ative+libraries+through+DistributedCache
 
 
 
 On 2/16/10 5:29 PM, "Jason Rutherglen" 
>>> wrote:
 
> How would this work?
> 
> On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
>  wrote:
>> 
>> ... or just use distributed cache.
>> 
>> 
>> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
>> 
>>> All native libraries should be on each of the cluster nodes.  You
>>> need
 to
>>> set "java.library.path" property to point to your libraries (or
>> just
 put
>>> them in the default system dirs).
>>> 
>>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
>>> wrote:
>>> 
 Can anybody point me how to use JNI calls in a map reduce program.
>>> My
 .so
 files have other dependencies also , is there a way to load the
 LD_LIBRARY_PATH for child processes . Should all the native stuff
>> be
 in
 HDFS?
 
 Thanks,
 Utkarsh.
 
>> 
>> 
 
 
>>> 
>> 
>> 
>> 
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>> 



Re: JNI in MAp REuce

2010-02-19 Thread Utkarsh Agarwal
I am using hadoop 0.20.1 , I added the attached patch still child processes
don't get the path :(

On Fri, Feb 19, 2010 at 10:57 AM, Allen Wittenauer  wrote:

>
> See http://issues.apache.org/jira/browse/HADOOP-2867 (and
> https://issues.apache.org/jira/browse/HADOOP-5980 if you are using 0.21 or
> Y! Hadoop w/LinuxTaskController). What version of Hadoop are you using?
>
> Also, if this is custom code, what does the runtime link path look like (
> -R
> during compile time)?  Using $ORIGIN might be useful here.
>
>
> On 2/19/10 8:47 AM, "Utkarsh Agarwal"  wrote:
>
> > How to set the  LD_LIBRARY_PATH  for the child , configuring
> mapred-site.xml
> > doesn't work.  Also  setting -Djava.library.path is not good enough since
> it
> > only gets the reference to the lib I am a trying to load(let's say
> lib.so) ,
> > but that lib has dependencies on other libs like lib1.so  resulting in
> > UnsatisfiedLinkError . Thus, LD_LIBRARY_PATH has to be set.
> >
> >
> >
> > On Thu, Feb 18, 2010 at 10:03 PM, Jason Venner  >wrote:
> >
> >> We used do this all the time at attributor. Now if I can remember how we
> >> did
> >> it.
> >>
> >> If the libraries are constant you can just install them on your nodes to
> >> save pushing them through the distributed cache, and then setup the
> >> LD_LIBRARY_PATH correctly.
> >>
> >> The key issue if you push them through the distributed cache is ensuring
> >> that the directory that the library gets dropped in, is actually in the
> >> runtime java.library.path
> >> You can also give explicit paths to System.load
> >>
> >> The -Djava.library.path in the child.options mapred.child.java.opts (if
> I
> >> have the param correct) should work also.
> >>
> >> On Thu, Feb 18, 2010 at 6:49 PM, Utkarsh Agarwal <
> unrealutka...@gmail.com
> >>> wrote:
> >>
> >>> My .so file has other .so dependencies , so would I have to add them
> all
> >> in
> >>> the DistributedCache . Also I tried setting LD_LIBRARY_PATH in
> >>> mapred-site.xml as
> >>>
> >>> 
> >>>  mapred.child.env
> >>>  LD_LIBRARY_PATH=/opt/libs/
> >>>  
> >>>
> >>>
> >>> doesnt work. the java.library.path is not sufficient to set , have to
> get
> >>> LD_LIB set.
> >>>
> >>> -Utkarsh
> >>>
> >>> On Thu, Feb 18, 2010 at 3:14 PM, Allen Wittenauer
> >>> wrote:
> >>>
> 
> 
>  Like this:
> 
> 
> 
> >>>
> >>
> http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+n
>  ative+libraries+through+DistributedCache
> 
> 
> 
>  On 2/16/10 5:29 PM, "Jason Rutherglen" 
> >>> wrote:
> 
> > How would this work?
> >
> > On Fri, Feb 12, 2010 at 10:45 AM, Allen Wittenauer
> >  wrote:
> >>
> >> ... or just use distributed cache.
> >>
> >>
> >> On 2/12/10 10:02 AM, "Alex Kozlov"  wrote:
> >>
> >>> All native libraries should be on each of the cluster nodes.  You
> >>> need
>  to
> >>> set "java.library.path" property to point to your libraries (or
> >> just
>  put
> >>> them in the default system dirs).
> >>>
> >>> On Fri, Feb 12, 2010 at 9:12 AM, Utkarsh Agarwal
> >>> wrote:
> >>>
>  Can anybody point me how to use JNI calls in a map reduce program.
> >>> My
>  .so
>  files have other dependencies also , is there a way to load the
>  LD_LIBRARY_PATH for child processes . Should all the native stuff
> >> be
>  in
>  HDFS?
> 
>  Thanks,
>  Utkarsh.
> 
> >>
> >>
> 
> 
> >>>
> >>
> >>
> >>
> >> --
> >> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> >> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> >> www.prohadoopbook.com a community for Hadoop Professionals
> >>
>
>
Index: src/mapred/mapred-default.xml
===
--- src/mapred/mapred-default.xml	(revision 781689)
+++ src/mapred/mapred-default.xml	(working copy)
@@ -406,6 +406,16 @@
 
 
 
+  mapred.child.env
+  
+  User added environment variables for the task tracker child 
+  processes. Example :
+  1) A=foo  This will set the env variable A to foo
+  2) B=$B:c This is inherit tasktracker's B env variable.  
+  
+
+
+
   mapred.child.ulimit
   
   The maximum virtual memory, in KB, of a process launched by the 
Index: src/mapred/org/apache/hadoop/mapred/TaskRunner.java
===
--- src/mapred/org/apache/hadoop/mapred/TaskRunner.java	(revision 781689)
+++ src/mapred/org/apache/hadoop/mapred/TaskRunner.java	(working copy)
@@ -399,6 +399,25 @@
 ldLibraryPath.append(oldLdLibraryPath);
   }
   env.put("LD_LIBRARY_PATH", ldLibraryPath.toString());
+  
+  // add the env variables passed by the user
+  String mapredChildEnv = conf.get("mapred.child.env");
+  if (mapredChildEnv != null && mapredChildEnv.length() > 0) {
+String childEnvs[] = mapredChildEnv.split(",");
+for (String cEnv : childEnvs) {