Re: Distribute native library within a job jar

Arun C Murthy Sun, 10 Jul 2011 21:42:55 -0700

Jarod,

On Jul 10, 2011, at 3:13 PM, Donghan (Jarod) Wang wrote:


> Hey Arun,
> 
> Thank you for the reply. The way you mentioned requires setting up
> native libraries somewhere on the hdfs before starting the job, which
> is what I am trying to avoid. What I want is bundling the libraries
> within the job JAR, in other words the libraries are shipped with the
> JAR and need not be pre-installed on the system. And once the job gets
> running, it extracts the lib from the job JAR and System.load it. I
> wonder if it is possible.
> 

It's possible, but very tedious.

Currently (0.20.xxx) unjars the job.jar for you, but that is going away in 0.23 
(even 0.22 I guess). Even then, you'll have to manually figure the path, load 
it etc.

OTOH, using the DC is supported. Even better, the native .so will be shared 
across jobs - so it's downloaded into the DC only once and re-used. I'd highly 
recommend that.

hth,
Arun

> Thanks,
> Jarod
> 
> On Sat, Jul 9, 2011 at 3:20 PM, Arun C Murthy <acmur...@apache.org> wrote:
>> Jarod,
>> 
>> On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote:
>> 
>>> Hey all,
>>> 
>>> I'm working on a project that uses a native c library. Although I can
>>> use DistributedCache as a way to distribute the c library, I'd like to
>>> use the jar to do the job. What I mean is packing the c library into
>>> the job jar, and writing code in a way that the job can find the
>>> library once it gets submitted. I wonder if this is possible. If so
>>> how can I obtain the path in the code.
>> 
>> 
>> Just add it as a cache-file in the distributed cache, enable the =
>> symlink and just System.load the filename (of the symlink).
>> 
>> More details: 
>> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#DistributedCache
>> 
>> hth,
>> Arun

Re: Distribute native library within a job jar

Reply via email to